* [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
@ 2015-01-14 6:34 Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14 6:34 UTC (permalink / raw)
To: arnd, robh+dt, davem, grant.likely, agraf
Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
zhangfei.gao, netdev, devicetree, linux
v13:
- Fix the problem of alignment parameters for function and checkpatch warming.
v12:
- According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
for hip04 ethernet.
v11:
- Add ethtool support for tx coalecse getting and setting, the xmit_more
is not supported for this patch, but I think it could work for hip04,
will support it later after some tests for performance better.
Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
- Before:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
- After:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
v10:
- According Arnd's suggestion, remove the skb_orphan and use the hrtimer
for the cleanup of the TX queue and add some modification for the hip04
drivers.
1) drop the broken skb_orphan call
2) drop the workqueue
3) batch cleanup based on tx_coalesce_frames/usecs for better throughput
4) use a reasonable default tx timeout (200us, could be shorted
based on measurements) with a range timer
5) fix napi poll function return value
6) use a lockless queue for cleanup
v9:
- There is no tx completion interrupts to free DMAd Tx packets, it means taht
we rely on new tx packets arriving to run the destructors of completed packets,
which open up space in their sockets's send queues. Sometimes we don't get such
new packets causing Tx to stall, a single UDP transmitter is a good example of
this situation, so we need a clean up workqueue to reclaims completed packets,
the workqueue will only free the last packets which is already stay for several jiffies.
Also fix some format cleanups.
v8:
- Use poll to reclaim xmitted buffer as workaround since no tx done interrupt
v7:
- Remove select NET_CORE in 0002
v6:
- Suggest by Russell: Use netdev_sent_queue & netdev_completed_queue to solve latency issue
Also shorten the period of timer, which is used to wakeup the queue since no
tx completed interrupt.
v5:
- no big change, fix typo
v4:
- Modify accoringly to the suggetion from Arnd, Florian, Eric, David
Use of_parse_phandle_with_fixed_args & syscon_node_to_regmap get ppe info
Add skb_orphan() and tx_timer for reclaim since no tx_finished interrupt
Update timeout, and move of_phy_connect to probe to reuse open/stop
v3:
- Suggest from Arnd, use syscon & regmap_write/read to replace static void __iomem *ppebase.
Modify hisilicon-hip04-net.txt accrordingly to suggestion from Florian and Sergei.
v2:
- Got many suggestions from Russell, Arnd, Florian, Mark and Sergei
Remove memcpy, use dma_map/unmap_single, use dma_alloc_coherent rather than dma_pool, etc.
Refer property in ethernet.txt, change ppe description, etc.
Ding Tianhong (1):
net: hisilicon: new hip04 ethernet driver
Zhangfei Gao (2):
Documentation: add Device tree bindings for Hisilicon hip04 ethernet
net: hisilicon: new hip04 MDIO driver
.../bindings/net/hisilicon-hip04-net.txt | 88 ++
drivers/net/ethernet/hisilicon/Kconfig | 9 +
drivers/net/ethernet/hisilicon/Makefile | 1 +
drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++
drivers/net/ethernet/hisilicon/hip04_mdio.c | 186 ++++
5 files changed, 1253 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
--
1.8.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
2015-01-14 6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
@ 2015-01-14 6:34 ` Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver Ding Tianhong
` (2 subsequent siblings)
3 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14 6:34 UTC (permalink / raw)
To: arnd, robh+dt, davem, grant.likely, agraf
Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
zhangfei.gao, netdev, devicetree, linux
From: Zhangfei Gao <zhangfei.gao@linaro.org>
This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
.../bindings/net/hisilicon-hip04-net.txt | 88 ++++++++++++++++++++++
1 file changed, 88 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
new file mode 100644
index 0000000..988fc69
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -0,0 +1,88 @@
+Hisilicon hip04 Ethernet Controller
+
+* Ethernet controller node
+
+Required properties:
+- compatible: should be "hisilicon,hip04-mac".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device.
+- port-handle: <phandle port channel>
+ phandle, specifies a reference to the syscon ppe node
+ port, port number connected to the controller
+ channel, recv channel start from channel * number (RX_DESC_NUM)
+- phy-mode: see ethernet.txt [1].
+
+Optional properties:
+- phy-handle: see ethernet.txt [1].
+
+[1] Documentation/devicetree/bindings/net/ethernet.txt
+
+
+* Ethernet ppe node:
+Control rx & tx fifos of all ethernet controllers.
+Have 2048 recv channels shared by all ethernet controllers, only if no overlap.
+Each controller's recv channel start from channel * number (RX_DESC_NUM).
+
+Required properties:
+- compatible: "hisilicon,hip04-ppe", "syscon".
+- reg: address and length of the register set for the device.
+
+
+* MDIO bus node:
+
+Required properties:
+
+- compatible: should be "hisilicon,hip04-mdio".
+- Inherits from MDIO bus node binding [2]
+[2] Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+ mdio {
+ compatible = "hisilicon,hip04-mdio";
+ reg = <0x28f1000 0x1000>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ phy0: ethernet-phy@0 {
+ compatible = "ethernet-phy-ieee802.3-c22";
+ reg = <0>;
+ marvell,reg-init = <18 0x14 0 0x8001>;
+ };
+
+ phy1: ethernet-phy@1 {
+ compatible = "ethernet-phy-ieee802.3-c22";
+ reg = <1>;
+ marvell,reg-init = <18 0x14 0 0x8001>;
+ };
+ };
+
+ ppe: ppe@28c0000 {
+ compatible = "hisilicon,hip04-ppe", "syscon";
+ reg = <0x28c0000 0x10000>;
+ };
+
+ fe: ethernet@28b0000 {
+ compatible = "hisilicon,hip04-mac";
+ reg = <0x28b0000 0x10000>;
+ interrupts = <0 413 4>;
+ phy-mode = "mii";
+ port-handle = <&ppe 31 0>;
+ };
+
+ ge0: ethernet@2800000 {
+ compatible = "hisilicon,hip04-mac";
+ reg = <0x2800000 0x10000>;
+ interrupts = <0 402 4>;
+ phy-mode = "sgmii";
+ port-handle = <&ppe 0 1>;
+ phy-handle = <&phy0>;
+ };
+
+ ge8: ethernet@2880000 {
+ compatible = "hisilicon,hip04-mac";
+ reg = <0x2880000 0x10000>;
+ interrupts = <0 410 4>;
+ phy-mode = "sgmii";
+ port-handle = <&ppe 8 2>;
+ phy-handle = <&phy1>;
+ };
--
1.8.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver
2015-01-14 6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
@ 2015-01-14 6:34 ` Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
3 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14 6:34 UTC (permalink / raw)
To: arnd, robh+dt, davem, grant.likely, agraf
Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
zhangfei.gao, netdev, devicetree, linux
From: Zhangfei Gao <zhangfei.gao@linaro.org>
Hisilicon hip04 platform mdio driver
Reuse Marvell phy drivers/net/phy/marvell.c
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/net/ethernet/hisilicon/Kconfig | 9 ++
drivers/net/ethernet/hisilicon/Makefile | 1 +
drivers/net/ethernet/hisilicon/hip04_mdio.c | 186 ++++++++++++++++++++++++++++
3 files changed, 196 insertions(+)
create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
index e942173..a54d897 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -24,4 +24,13 @@ config HIX5HD2_GMAC
help
This selects the hix5hd2 mac family network device.
+config HIP04_ETH
+ tristate "HISILICON P04 Ethernet support"
+ select PHYLIB
+ select MARVELL_PHY
+ select MFD_SYSCON
+ ---help---
+ If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
+ want to use the internal ethernet then you should answer Y to this.
+
endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 9175e846..40115a7 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,3 +3,4 @@
#
obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
new file mode 100644
index 0000000..b3bac25
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
@@ -0,0 +1,186 @@
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/of_mdio.h>
+#include <linux/delay.h>
+
+#define MDIO_CMD_REG 0x0
+#define MDIO_ADDR_REG 0x4
+#define MDIO_WDATA_REG 0x8
+#define MDIO_RDATA_REG 0xc
+#define MDIO_STA_REG 0x10
+
+#define MDIO_START BIT(14)
+#define MDIO_R_VALID BIT(1)
+#define MDIO_READ (BIT(12) | BIT(11) | MDIO_START)
+#define MDIO_WRITE (BIT(12) | BIT(10) | MDIO_START)
+
+struct hip04_mdio_priv {
+ void __iomem *base;
+};
+
+#define WAIT_TIMEOUT 10
+static int hip04_mdio_wait_ready(struct mii_bus *bus)
+{
+ struct hip04_mdio_priv *priv = bus->priv;
+ int i;
+
+ for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
+ if (i == WAIT_TIMEOUT)
+ return -ETIMEDOUT;
+ msleep(20);
+ }
+
+ return 0;
+}
+
+static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+ struct hip04_mdio_priv *priv = bus->priv;
+ u32 val;
+ int ret;
+
+ ret = hip04_mdio_wait_ready(bus);
+ if (ret < 0)
+ goto out;
+
+ val = regnum | (mii_id << 5) | MDIO_READ;
+ writel_relaxed(val, priv->base + MDIO_CMD_REG);
+
+ ret = hip04_mdio_wait_ready(bus);
+ if (ret < 0)
+ goto out;
+
+ val = readl_relaxed(priv->base + MDIO_STA_REG);
+ if (val & MDIO_R_VALID) {
+ dev_err(bus->parent, "SMI bus read not valid\n");
+ ret = -ENODEV;
+ goto out;
+ }
+
+ val = readl_relaxed(priv->base + MDIO_RDATA_REG);
+ ret = val & 0xFFFF;
+out:
+ return ret;
+}
+
+static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
+ int regnum, u16 value)
+{
+ struct hip04_mdio_priv *priv = bus->priv;
+ u32 val;
+ int ret;
+
+ ret = hip04_mdio_wait_ready(bus);
+ if (ret < 0)
+ goto out;
+
+ writel_relaxed(value, priv->base + MDIO_WDATA_REG);
+ val = regnum | (mii_id << 5) | MDIO_WRITE;
+ writel_relaxed(val, priv->base + MDIO_CMD_REG);
+out:
+ return ret;
+}
+
+static int hip04_mdio_reset(struct mii_bus *bus)
+{
+ int temp, i;
+
+ for (i = 0; i < PHY_MAX_ADDR; i++) {
+ hip04_mdio_write(bus, i, 22, 0);
+ temp = hip04_mdio_read(bus, i, MII_BMCR);
+ if (temp < 0)
+ continue;
+
+ temp |= BMCR_RESET;
+ if (hip04_mdio_write(bus, i, MII_BMCR, temp) < 0)
+ continue;
+ }
+
+ mdelay(500);
+ return 0;
+}
+
+static int hip04_mdio_probe(struct platform_device *pdev)
+{
+ struct resource *r;
+ struct mii_bus *bus;
+ struct hip04_mdio_priv *priv;
+ int ret;
+
+ bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
+ if (!bus) {
+ dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
+ return -ENOMEM;
+ }
+
+ bus->name = "hip04_mdio_bus";
+ bus->read = hip04_mdio_read;
+ bus->write = hip04_mdio_write;
+ bus->reset = hip04_mdio_reset;
+ snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(&pdev->dev));
+ bus->parent = &pdev->dev;
+ priv = bus->priv;
+
+ r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ priv->base = devm_ioremap_resource(&pdev->dev, r);
+ if (IS_ERR(priv->base)) {
+ ret = PTR_ERR(priv->base);
+ goto out_mdio;
+ }
+
+ ret = of_mdiobus_register(bus, pdev->dev.of_node);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
+ goto out_mdio;
+ }
+
+ platform_set_drvdata(pdev, bus);
+
+ return 0;
+
+out_mdio:
+ mdiobus_free(bus);
+ return ret;
+}
+
+static int hip04_mdio_remove(struct platform_device *pdev)
+{
+ struct mii_bus *bus = platform_get_drvdata(pdev);
+
+ mdiobus_unregister(bus);
+ mdiobus_free(bus);
+
+ return 0;
+}
+
+static const struct of_device_id hip04_mdio_match[] = {
+ { .compatible = "hisilicon,hip04-mdio" },
+ { }
+};
+MODULE_DEVICE_TABLE(of, hip04_mdio_match);
+
+static struct platform_driver hip04_mdio_driver = {
+ .probe = hip04_mdio_probe,
+ .remove = hip04_mdio_remove,
+ .driver = {
+ .name = "hip04-mdio",
+ .owner = THIS_MODULE,
+ .of_match_table = hip04_mdio_match,
+ },
+};
+
+module_platform_driver(hip04_mdio_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-mdio");
--
1.8.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver Ding Tianhong
@ 2015-01-14 6:34 ` Ding Tianhong
2015-01-14 8:06 ` Joe Perches
` (5 more replies)
2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
3 siblings, 6 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14 6:34 UTC (permalink / raw)
To: arnd, robh+dt, davem, grant.likely, agraf
Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
zhangfei.gao, netdev, devicetree, linux
Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
v13: Fix the problem of alignment parameters for function and checkpatch warming.
v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
for hip04 ethernet.
v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
is not supported for this patch, but I think it could work for hip04,
will support it later after some tests for performance better.
Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
- Before:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
- After:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
v10: According David Miller and Arnd Bergmann's suggestion, add some modification
for v9 version
- drop the workqueue
- batch cleanup based on tx_coalesce_frames/usecs for better throughput
- use a reasonable default tx timeout (200us, could be shorted
based on measurements) with a range timer
- fix napi poll function return value
- use a lockless queue for cleanup
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/net/ethernet/hisilicon/Makefile | 2 +-
drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
2 files changed, 970 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 40115a7..6c14540 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,4 +3,4 @@
#
obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..525214e
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,969 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/ktime.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR 0x100
+#define PPE_CFG_POOL_GRP 0x300
+#define PPE_CFG_RX_BUF_SIZE 0x400
+#define PPE_CFG_RX_FIFO_SIZE 0x500
+#define PPE_CURR_BUF_CNT 0xa200
+
+#define GE_DUPLEX_TYPE 0x08
+#define GE_MAX_FRM_SIZE_REG 0x3c
+#define GE_PORT_MODE 0x40
+#define GE_PORT_EN 0x44
+#define GE_SHORT_RUNTS_THR_REG 0x50
+#define GE_TX_LOCAL_PAGE_REG 0x5c
+#define GE_TRANSMIT_CONTROL_REG 0x60
+#define GE_CF_CRC_STRIP_REG 0x1b0
+#define GE_MODE_CHANGE_REG 0x1b4
+#define GE_RECV_CONTROL_REG 0x1e0
+#define GE_STATION_MAC_ADDRESS 0x210
+#define PPE_CFG_CPU_ADD_ADDR 0x580
+#define PPE_CFG_MAX_FRAME_LEN_REG 0x408
+#define PPE_CFG_BUS_CTRL_REG 0x424
+#define PPE_CFG_RX_CTRL_REG 0x428
+#define PPE_CFG_RX_PKT_MODE_REG 0x438
+#define PPE_CFG_QOS_VMID_GEN 0x500
+#define PPE_CFG_RX_PKT_INT 0x538
+#define PPE_INTEN 0x600
+#define PPE_INTSTS 0x608
+#define PPE_RINT 0x604
+#define PPE_CFG_STS_MODE 0x700
+#define PPE_HIS_RX_PKT_CNT 0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT BIT(10)
+#define RCV_NOBUF BIT(8)
+#define RCV_DROP BIT(7)
+#define TX_DROP BIT(6)
+#define DEF_INT_ERR (RCV_NOBUF | RCV_DROP | TX_DROP)
+#define DEF_INT_MASK (RCV_INT | DEF_INT_ERR)
+
+/* TX descriptor config */
+#define TX_FREE_MEM BIT(0)
+#define TX_READ_ALLOC_L3 BIT(1)
+#define TX_FINISH_CACHE_INV BIT(2)
+#define TX_CLEAR_WB BIT(4)
+#define TX_L3_CHECKSUM BIT(5)
+#define TX_LOOP_BACK BIT(11)
+
+/* RX error */
+#define RX_PKT_DROP BIT(0)
+#define RX_L2_ERR BIT(1)
+#define RX_PKT_ERR (RX_PKT_DROP | RX_L2_ERR)
+
+#define SGMII_SPEED_1000 0x08
+#define SGMII_SPEED_100 0x07
+#define SGMII_SPEED_10 0x06
+#define MII_SPEED_100 0x01
+#define MII_SPEED_10 0x00
+
+#define GE_DUPLEX_FULL BIT(0)
+#define GE_DUPLEX_HALF 0x00
+#define GE_MODE_CHANGE_EN BIT(0)
+
+#define GE_TX_AUTO_NEG BIT(5)
+#define GE_TX_ADD_CRC BIT(6)
+#define GE_TX_SHORT_PAD_THROUGH BIT(7)
+
+#define GE_RX_STRIP_CRC BIT(0)
+#define GE_RX_STRIP_PAD BIT(3)
+#define GE_RX_PAD_EN BIT(4)
+
+#define GE_AUTO_NEG_CTL BIT(0)
+
+#define GE_RX_INT_THRESHOLD BIT(6)
+#define GE_RX_TIMEOUT 0x04
+
+#define GE_RX_PORT_EN BIT(1)
+#define GE_TX_PORT_EN BIT(2)
+
+#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12)
+
+#define PPE_CFG_RX_PKT_ALIGN BIT(18)
+#define PPE_CFG_QOS_VMID_MODE BIT(14)
+#define PPE_CFG_QOS_VMID_GRP_SHIFT 8
+
+#define PPE_CFG_RX_FIFO_FSFU BIT(11)
+#define PPE_CFG_RX_DEPTH_SHIFT 16
+#define PPE_CFG_RX_START_SHIFT 0
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT 11
+
+#define PPE_CFG_BUS_LOCAL_REL BIT(14)
+#define PPE_CFG_BUS_BIG_ENDIEN BIT(0)
+
+#define RX_DESC_NUM 128
+#define TX_DESC_NUM 256
+#define TX_NEXT(N) (((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N) (((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN 379
+#define GMAC_MAX_PKT_LEN 1516
+#define GMAC_MIN_PKT_LEN 31
+#define RX_BUF_SIZE 1600
+#define RESET_TIMEOUT 1000
+#define TX_TIMEOUT (6 * HZ)
+
+#define DRV_NAME "hip04-ether"
+#define DRV_VERSION "v1.0"
+
+#define HIP04_MAX_TX_COALESCE_USECS 200
+#define HIP04_MIN_TX_COALESCE_USECS 100
+#define HIP04_MAX_TX_COALESCE_FRAMES 200
+#define HIP04_MIN_TX_COALESCE_FRAMES 100
+
+struct tx_desc {
+ u32 send_addr;
+ u32 send_size;
+ u32 next_addr;
+ u32 cfg;
+ u32 wb_addr;
+} __aligned(64);
+
+struct rx_desc {
+ u16 reserved_16;
+ u16 pkt_len;
+ u32 reserve1[3];
+ u32 pkt_err;
+ u32 reserve2[4];
+};
+
+struct hip04_priv {
+ void __iomem *base;
+ int phy_mode;
+ int chan;
+ unsigned int port;
+ unsigned int speed;
+ unsigned int duplex;
+ unsigned int reg_inten;
+
+ struct napi_struct napi;
+ struct net_device *ndev;
+
+ struct tx_desc *tx_desc;
+ dma_addr_t tx_desc_dma;
+ struct sk_buff *tx_skb[TX_DESC_NUM];
+ dma_addr_t tx_phys[TX_DESC_NUM];
+ unsigned int tx_head;
+
+ int tx_coalesce_frames;
+ int tx_coalesce_usecs;
+ struct hrtimer tx_coalesce_timer;
+
+ unsigned char *rx_buf[RX_DESC_NUM];
+ dma_addr_t rx_phys[RX_DESC_NUM];
+ unsigned int rx_head;
+ unsigned int rx_buf_size;
+
+ struct device_node *phy_node;
+ struct phy_device *phy;
+ struct regmap *map;
+ struct work_struct tx_timeout_task;
+
+ /* written only by tx cleanup */
+ unsigned int tx_tail ____cacheline_aligned_in_smp;
+};
+
+static inline unsigned int tx_count(unsigned int head, unsigned int tail)
+{
+ return (head - tail) % (TX_DESC_NUM - 1);
+}
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ u32 val;
+
+ priv->speed = speed;
+ priv->duplex = duplex;
+
+ switch (priv->phy_mode) {
+ case PHY_INTERFACE_MODE_SGMII:
+ if (speed == SPEED_1000)
+ val = SGMII_SPEED_1000;
+ else if (speed == SPEED_100)
+ val = SGMII_SPEED_100;
+ else
+ val = SGMII_SPEED_10;
+ break;
+ case PHY_INTERFACE_MODE_MII:
+ if (speed == SPEED_100)
+ val = MII_SPEED_100;
+ else
+ val = MII_SPEED_10;
+ break;
+ default:
+ netdev_warn(ndev, "not supported mode\n");
+ val = MII_SPEED_10;
+ break;
+ }
+ writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+ val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
+ writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+ val = GE_MODE_CHANGE_EN;
+ writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+ u32 val, tmp, timeout = 0;
+
+ do {
+ regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+ regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+ if (timeout++ > RESET_TIMEOUT)
+ break;
+ } while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+ u32 val;
+
+ val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+ val |= PPE_CFG_STS_RX_PKT_CNT_RC;
+ writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+ val = BIT(priv->port);
+ regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+ val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
+ val |= PPE_CFG_QOS_VMID_MODE;
+ writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+ val = RX_BUF_SIZE;
+ regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+ val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
+ val |= PPE_CFG_RX_FIFO_FSFU;
+ val |= priv->chan << PPE_CFG_RX_START_SHIFT;
+ regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+ val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
+ writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+ val = PPE_CFG_RX_PKT_ALIGN;
+ writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+ val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
+ writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+ val = GMAC_PPE_RX_PKT_MAX_LEN;
+ writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+ val = GMAC_MAX_PKT_LEN;
+ writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+ val = GMAC_MIN_PKT_LEN;
+ writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+ val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+ val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
+ writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+ val = GE_RX_STRIP_CRC;
+ writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+ val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+ val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
+ writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+ val = GE_AUTO_NEG_CTL;
+ writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ u32 val;
+
+ /* enable tx & rx */
+ val = readl_relaxed(priv->base + GE_PORT_EN);
+ val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
+ writel_relaxed(val, priv->base + GE_PORT_EN);
+
+ /* clear rx int */
+ val = RCV_INT;
+ writel_relaxed(val, priv->base + PPE_RINT);
+
+ /* config recv int */
+ val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
+ writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+ /* enable interrupt */
+ priv->reg_inten = DEF_INT_MASK;
+ writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ u32 val;
+
+ /* disable int */
+ priv->reg_inten &= ~(DEF_INT_MASK);
+ writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+ /* disable tx & rx */
+ val = readl_relaxed(priv->base + GE_PORT_EN);
+ val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
+ writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+ writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+ regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+ return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+
+ writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+ priv->base + GE_STATION_MAC_ADDRESS);
+ writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+ (ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+ priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+ eth_mac_addr(ndev, addr);
+ hip04_update_mac_address(ndev);
+ return 0;
+}
+
+static int hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ unsigned tx_tail = priv->tx_tail;
+ struct tx_desc *desc;
+ unsigned int bytes_compl = 0, pkts_compl = 0;
+ unsigned int count;
+
+ smp_rmb();
+ count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
+ if (count == 0)
+ goto out;
+
+ while (count) {
+ desc = &priv->tx_desc[tx_tail];
+ if (desc->send_addr != 0) {
+ if (force)
+ desc->send_addr = 0;
+ else
+ break;
+ }
+
+ if (priv->tx_phys[tx_tail]) {
+ dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+ priv->tx_skb[tx_tail]->len,
+ DMA_TO_DEVICE);
+ priv->tx_phys[tx_tail] = 0;
+ }
+ pkts_compl++;
+ bytes_compl += priv->tx_skb[tx_tail]->len;
+ dev_kfree_skb(priv->tx_skb[tx_tail]);
+ priv->tx_skb[tx_tail] = NULL;
+ tx_tail = TX_NEXT(tx_tail);
+ count--;
+ }
+
+ priv->tx_tail = tx_tail;
+ smp_wmb(); /* Ensure tx_tail visible to xmit */
+
+out:
+ if (pkts_compl || bytes_compl)
+ netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+ if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
+ netif_wake_queue(ndev);
+
+ return count;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ struct net_device_stats *stats = &ndev->stats;
+ unsigned int tx_head = priv->tx_head, count;
+ struct tx_desc *desc = &priv->tx_desc[tx_head];
+ dma_addr_t phys;
+
+ smp_rmb();
+ count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
+ if (count == (TX_DESC_NUM - 1)) {
+ netif_stop_queue(ndev);
+ return NETDEV_TX_BUSY;
+ }
+
+ phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+ if (dma_mapping_error(&ndev->dev, phys)) {
+ dev_kfree_skb(skb);
+ return NETDEV_TX_OK;
+ }
+
+ priv->tx_skb[tx_head] = skb;
+ priv->tx_phys[tx_head] = phys;
+ desc->send_addr = cpu_to_be32(phys);
+ desc->send_size = cpu_to_be32(skb->len);
+ desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+ phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+ desc->wb_addr = cpu_to_be32(phys);
+ skb_tx_timestamp(skb);
+
+ hip04_set_xmit_desc(priv, phys);
+ priv->tx_head = TX_NEXT(tx_head);
+ count++;
+ netdev_sent_queue(ndev, skb->len);
+
+ stats->tx_bytes += skb->len;
+ stats->tx_packets++;
+
+ /* Ensure tx_head update visible to tx reclaim */
+ smp_wmb();
+
+ /* queue is getting full, better start cleaning up now */
+ if (count >= priv->tx_coalesce_frames) {
+ if (napi_schedule_prep(&priv->napi)) {
+ /* disable rx interrupt and timer */
+ priv->reg_inten &= ~(RCV_INT);
+ writel_relaxed(DEF_INT_MASK & ~RCV_INT,
+ priv->base + PPE_INTEN);
+ hrtimer_cancel(&priv->tx_coalesce_timer);
+ __napi_schedule(&priv->napi);
+ }
+ } else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
+ /* cleanup not pending yet, start a new timer */
+ hrtimer_start_expires(&priv->tx_coalesce_timer,
+ HRTIMER_MODE_REL);
+ }
+
+ return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+ struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+ struct net_device *ndev = priv->ndev;
+ struct net_device_stats *stats = &ndev->stats;
+ unsigned int cnt = hip04_recv_cnt(priv);
+ struct rx_desc *desc;
+ struct sk_buff *skb;
+ unsigned char *buf;
+ bool last = false;
+ dma_addr_t phys;
+ int rx = 0;
+ int tx_remaining;
+ u16 len;
+ u32 err;
+
+ while (cnt && !last) {
+ buf = priv->rx_buf[priv->rx_head];
+ skb = build_skb(buf, priv->rx_buf_size);
+ if (unlikely(!skb))
+ net_dbg_ratelimited("build_skb failed\n");
+
+ dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+ RX_BUF_SIZE, DMA_FROM_DEVICE);
+ priv->rx_phys[priv->rx_head] = 0;
+
+ desc = (struct rx_desc *)skb->data;
+ len = be16_to_cpu(desc->pkt_len);
+ err = be32_to_cpu(desc->pkt_err);
+
+ if (0 == len) {
+ dev_kfree_skb_any(skb);
+ last = true;
+ } else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
+ dev_kfree_skb_any(skb);
+ stats->rx_dropped++;
+ stats->rx_errors++;
+ } else {
+ skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+ skb_put(skb, len);
+ skb->protocol = eth_type_trans(skb, ndev);
+ napi_gro_receive(&priv->napi, skb);
+ stats->rx_packets++;
+ stats->rx_bytes += len;
+ rx++;
+ }
+
+ buf = netdev_alloc_frag(priv->rx_buf_size);
+ if (!buf)
+ goto done;
+ phys = dma_map_single(&ndev->dev, buf,
+ RX_BUF_SIZE, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&ndev->dev, phys))
+ goto done;
+ priv->rx_buf[priv->rx_head] = buf;
+ priv->rx_phys[priv->rx_head] = phys;
+ hip04_set_recv_desc(priv, phys);
+
+ priv->rx_head = RX_NEXT(priv->rx_head);
+ if (rx >= budget)
+ goto done;
+
+ if (--cnt == 0)
+ cnt = hip04_recv_cnt(priv);
+ }
+
+ if (!(priv->reg_inten & RCV_INT)) {
+ /* enable rx interrupt */
+ priv->reg_inten |= RCV_INT;
+ writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+ }
+ napi_complete(napi);
+done:
+ /* clean up tx descriptors and start a new timer if necessary */
+ tx_remaining = hip04_tx_reclaim(ndev, false);
+ if (rx < budget && tx_remaining)
+ hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
+
+ return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+ struct net_device *ndev = (struct net_device *)dev_id;
+ struct hip04_priv *priv = netdev_priv(ndev);
+ struct net_device_stats *stats = &ndev->stats;
+ u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+ if (!ists)
+ return IRQ_NONE;
+
+ writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+ if (unlikely(ists & DEF_INT_ERR)) {
+ if (ists & (RCV_NOBUF | RCV_DROP))
+ stats->rx_errors++;
+ stats->rx_dropped++;
+ netdev_err(ndev, "rx drop\n");
+ if (ists & TX_DROP) {
+ stats->tx_dropped++;
+ netdev_err(ndev, "tx drop\n");
+ }
+ }
+
+ if (ists & RCV_INT && napi_schedule_prep(&priv->napi)) {
+ /* disable rx interrupt */
+ priv->reg_inten &= ~(RCV_INT);
+ writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+ hrtimer_cancel(&priv->tx_coalesce_timer);
+ __napi_schedule(&priv->napi);
+ }
+
+ return IRQ_HANDLED;
+}
+
+enum hrtimer_restart tx_done(struct hrtimer *hrtimer)
+{
+ struct hip04_priv *priv;
+
+ priv = container_of(hrtimer, struct hip04_priv, tx_coalesce_timer);
+
+ if (napi_schedule_prep(&priv->napi)) {
+ /* disable rx interrupt */
+ priv->reg_inten &= ~(RCV_INT);
+ writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+ __napi_schedule(&priv->napi);
+ }
+
+ return HRTIMER_NORESTART;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ struct phy_device *phy = priv->phy;
+
+ if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+ hip04_config_port(ndev, phy->speed, phy->duplex);
+ phy_print_status(phy);
+ }
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ int i;
+
+ priv->rx_head = 0;
+ priv->tx_head = 0;
+ priv->tx_tail = 0;
+ hip04_reset_ppe(priv);
+
+ for (i = 0; i < RX_DESC_NUM; i++) {
+ dma_addr_t phys;
+
+ phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+ RX_BUF_SIZE, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&ndev->dev, phys))
+ return -EIO;
+
+ priv->rx_phys[i] = phys;
+ hip04_set_recv_desc(priv, phys);
+ }
+
+ if (priv->phy)
+ phy_start(priv->phy);
+
+ netdev_reset_queue(ndev);
+ netif_start_queue(ndev);
+ hip04_mac_enable(ndev);
+ napi_enable(&priv->napi);
+
+ return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ int i;
+
+ napi_disable(&priv->napi);
+ netif_stop_queue(ndev);
+ hip04_mac_disable(ndev);
+ hip04_tx_reclaim(ndev, true);
+ hip04_reset_ppe(priv);
+
+ if (priv->phy)
+ phy_stop(priv->phy);
+
+ for (i = 0; i < RX_DESC_NUM; i++) {
+ if (priv->rx_phys[i]) {
+ dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+ RX_BUF_SIZE, DMA_FROM_DEVICE);
+ priv->rx_phys[i] = 0;
+ }
+ }
+
+ return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+
+ schedule_work(&priv->tx_timeout_task);
+}
+
+static void hip04_tx_timeout_task(struct work_struct *work)
+{
+ struct hip04_priv *priv;
+
+ priv = container_of(work, struct hip04_priv, tx_timeout_task);
+ hip04_mac_stop(priv->ndev);
+ hip04_mac_open(priv->ndev);
+}
+
+static struct net_device_stats *hip04_get_stats(struct net_device *ndev)
+{
+ return &ndev->stats;
+}
+
+static int hip04_get_coalesce(struct net_device *netdev,
+ struct ethtool_coalesce *ec)
+{
+ struct hip04_priv *priv = netdev_priv(netdev);
+
+ ec->tx_coalesce_usecs = priv->tx_coalesce_usecs;
+ ec->tx_max_coalesced_frames = priv->tx_coalesce_frames;
+
+ return 0;
+}
+
+static int hip04_set_coalesce(struct net_device *netdev,
+ struct ethtool_coalesce *ec)
+{
+ struct hip04_priv *priv = netdev_priv(netdev);
+
+ /* Check not supported parameters */
+ if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
+ (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
+ (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
+ (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
+ (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
+ (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
+ (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
+ (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
+ (ec->tx_max_coalesced_frames_irq) ||
+ (ec->stats_block_coalesce_usecs) ||
+ (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
+ return -EOPNOTSUPP;
+
+ if ((ec->tx_coalesce_usecs > HIP04_MAX_TX_COALESCE_USECS ||
+ ec->tx_coalesce_usecs < HIP04_MIN_TX_COALESCE_USECS) ||
+ (ec->tx_max_coalesced_frames > HIP04_MAX_TX_COALESCE_FRAMES ||
+ ec->tx_max_coalesced_frames < HIP04_MIN_TX_COALESCE_FRAMES))
+ return -EINVAL;
+
+ priv->tx_coalesce_usecs = ec->tx_coalesce_usecs;
+ priv->tx_coalesce_frames = ec->tx_max_coalesced_frames;
+
+ return 0;
+}
+
+static void hip04_get_drvinfo(struct net_device *netdev,
+ struct ethtool_drvinfo *drvinfo)
+{
+ strlcpy(drvinfo->driver, DRV_NAME, sizeof(drvinfo->driver));
+ strlcpy(drvinfo->version, DRV_VERSION, sizeof(drvinfo->version));
+}
+
+static struct ethtool_ops hip04_ethtool_ops = {
+ .get_coalesce = hip04_get_coalesce,
+ .set_coalesce = hip04_set_coalesce,
+ .get_drvinfo = hip04_get_drvinfo,
+};
+
+static struct net_device_ops hip04_netdev_ops = {
+ .ndo_open = hip04_mac_open,
+ .ndo_stop = hip04_mac_stop,
+ .ndo_get_stats = hip04_get_stats,
+ .ndo_start_xmit = hip04_mac_start_xmit,
+ .ndo_set_mac_address = hip04_set_mac_address,
+ .ndo_tx_timeout = hip04_timeout,
+ .ndo_validate_addr = eth_validate_addr,
+ .ndo_change_mtu = eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ int i;
+
+ priv->tx_desc = dma_alloc_coherent(d,
+ TX_DESC_NUM * sizeof(struct tx_desc),
+ &priv->tx_desc_dma, GFP_KERNEL);
+ if (!priv->tx_desc)
+ return -ENOMEM;
+
+ priv->rx_buf_size = RX_BUF_SIZE +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ for (i = 0; i < RX_DESC_NUM; i++) {
+ priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+ if (!priv->rx_buf[i])
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+ struct hip04_priv *priv = netdev_priv(ndev);
+ int i;
+
+ for (i = 0; i < RX_DESC_NUM; i++)
+ if (priv->rx_buf[i])
+ put_page(virt_to_head_page(priv->rx_buf[i]));
+
+ for (i = 0; i < TX_DESC_NUM; i++)
+ if (priv->tx_skb[i])
+ dev_kfree_skb_any(priv->tx_skb[i]);
+
+ dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+ priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+ struct device *d = &pdev->dev;
+ struct device_node *node = d->of_node;
+ struct of_phandle_args arg;
+ struct net_device *ndev;
+ struct hip04_priv *priv;
+ struct resource *res;
+ unsigned int irq;
+ ktime_t txtime;
+ int ret;
+
+ ndev = alloc_etherdev(sizeof(struct hip04_priv));
+ if (!ndev)
+ return -ENOMEM;
+
+ priv = netdev_priv(ndev);
+ priv->ndev = ndev;
+ platform_set_drvdata(pdev, ndev);
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ priv->base = devm_ioremap_resource(d, res);
+ if (IS_ERR(priv->base)) {
+ ret = PTR_ERR(priv->base);
+ goto init_fail;
+ }
+
+ ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+ if (ret < 0) {
+ dev_warn(d, "no port-handle\n");
+ goto init_fail;
+ }
+
+ priv->port = arg.args[0];
+ priv->chan = arg.args[1] * RX_DESC_NUM;
+
+ hrtimer_init(&priv->tx_coalesce_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+ /* BQL will try to keep the TX queue as short as possible, but it can't
+ * be faster than tx_coalesce_usecs, so we need a fast timeout here,
+ * but also long enough to gather up enough frames to ensure we don't
+ * get more interrupts than necessary.
+ * 200us is enough for 16 frames of 1500 bytes at gigabit ethernet rate
+ */
+ priv->tx_coalesce_frames = TX_DESC_NUM * 3 / 4;
+ priv->tx_coalesce_usecs = 200;
+ /* allow timer to fire after half the time at the earliest */
+ txtime = ktime_set(0, priv->tx_coalesce_usecs * NSEC_PER_USEC / 2);
+ hrtimer_set_expires_range(&priv->tx_coalesce_timer, txtime, txtime);
+ priv->tx_coalesce_timer.function = tx_done;
+
+ priv->map = syscon_node_to_regmap(arg.np);
+ if (IS_ERR(priv->map)) {
+ dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+ ret = PTR_ERR(priv->map);
+ goto init_fail;
+ }
+
+ priv->phy_mode = of_get_phy_mode(node);
+ if (priv->phy_mode < 0) {
+ dev_warn(d, "not find phy-mode\n");
+ ret = -EINVAL;
+ goto init_fail;
+ }
+
+ irq = platform_get_irq(pdev, 0);
+ if (irq <= 0) {
+ ret = -EINVAL;
+ goto init_fail;
+ }
+
+ ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+ 0, pdev->name, ndev);
+ if (ret) {
+ netdev_err(ndev, "devm_request_irq failed\n");
+ goto init_fail;
+ }
+
+ priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+ if (priv->phy_node) {
+ priv->phy = of_phy_connect(ndev, priv->phy_node,
+ &hip04_adjust_link,
+ 0, priv->phy_mode);
+ if (!priv->phy) {
+ ret = -EPROBE_DEFER;
+ goto init_fail;
+ }
+ }
+
+ INIT_WORK(&priv->tx_timeout_task, hip04_tx_timeout_task);
+
+ ether_setup(ndev);
+ ndev->netdev_ops = &hip04_netdev_ops;
+ ndev->ethtool_ops = &hip04_ethtool_ops;
+ ndev->watchdog_timeo = TX_TIMEOUT;
+ ndev->priv_flags |= IFF_UNICAST_FLT;
+ ndev->irq = irq;
+ netif_napi_add(ndev, &priv->napi, hip04_rx_poll, NAPI_POLL_WEIGHT);
+ SET_NETDEV_DEV(ndev, &pdev->dev);
+
+ hip04_reset_ppe(priv);
+ if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+ hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+ hip04_config_fifo(priv);
+ random_ether_addr(ndev->dev_addr);
+ hip04_update_mac_address(ndev);
+
+ ret = hip04_alloc_ring(ndev, d);
+ if (ret) {
+ netdev_err(ndev, "alloc ring fail\n");
+ goto alloc_fail;
+ }
+
+ ret = register_netdev(ndev);
+ if (ret) {
+ free_netdev(ndev);
+ goto alloc_fail;
+ }
+
+ return 0;
+
+alloc_fail:
+ hip04_free_ring(ndev, d);
+init_fail:
+ of_node_put(priv->phy_node);
+ free_netdev(ndev);
+ return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+ struct net_device *ndev = platform_get_drvdata(pdev);
+ struct hip04_priv *priv = netdev_priv(ndev);
+ struct device *d = &pdev->dev;
+
+ if (priv->phy)
+ phy_disconnect(priv->phy);
+
+ hip04_free_ring(ndev, d);
+ unregister_netdev(ndev);
+ free_irq(ndev->irq, ndev);
+ of_node_put(priv->phy_node);
+ cancel_work_sync(&priv->tx_timeout_task);
+ free_netdev(ndev);
+
+ return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+ { .compatible = "hisilicon,hip04-mac" },
+ { }
+};
+
+MODULE_DEVICE_TABLE(of, hip04_mac_match);
+
+static struct platform_driver hip04_mac_driver = {
+ .probe = hip04_mac_probe,
+ .remove = hip04_remove,
+ .driver = {
+ .name = DRV_NAME,
+ .owner = THIS_MODULE,
+ .of_match_table = hip04_mac_match,
+ },
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
--
1.8.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
@ 2015-01-14 8:06 ` Joe Perches
2015-01-15 2:56 ` Ding Tianhong
2015-01-14 8:53 ` Arnd Bergmann
` (4 subsequent siblings)
5 siblings, 1 reply; 25+ messages in thread
From: Joe Perches @ 2015-01-14 8:06 UTC (permalink / raw)
To: Ding Tianhong
Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
Single comment:
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
[]
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
[]
> + while (cnt && !last) {
> + buf = priv->rx_buf[priv->rx_head];
> + skb = build_skb(buf, priv->rx_buf_size);
> + if (unlikely(!skb))
> + net_dbg_ratelimited("build_skb failed\n");
> +
> + dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> + RX_BUF_SIZE, DMA_FROM_DEVICE);
> + priv->rx_phys[priv->rx_head] = 0;
> +
> + desc = (struct rx_desc *)skb->data;
Perhaps if (!skb) is possible, there shouldn't be
a dereference of that known null here.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
2015-01-14 8:06 ` Joe Perches
@ 2015-01-14 8:53 ` Arnd Bergmann
2015-01-15 2:54 ` Ding Tianhong
2015-01-14 16:34 ` Eric Dumazet
` (3 subsequent siblings)
5 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-14 8:53 UTC (permalink / raw)
To: Ding Tianhong
Cc: robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
> +#define HIP04_MAX_TX_COALESCE_USECS 200
> +#define HIP04_MIN_TX_COALESCE_USECS 100
> +#define HIP04_MAX_TX_COALESCE_FRAMES 200
> +#define HIP04_MIN_TX_COALESCE_FRAMES 100
It's not important, but in case you are creating another version of the
patch, maybe the allowed range can be extended somewhat. The example values
I picked when I sent my suggestion were really made up. It's great if
they work fine, but users might want to tune this far more depending on
their workloads, How about these
#define HIP04_MAX_TX_COALESCE_USECS 100000
#define HIP04_MIN_TX_COALESCE_USECS 1
#define HIP04_MAX_TX_COALESCE_FRAMES (TX_DESC_NUM - 1)
#define HIP04_MIN_TX_COALESCE_FRAMES 1
Arnd
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
2015-01-14 6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
` (2 preceding siblings ...)
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
@ 2015-01-14 10:19 ` Alexander Graf
2015-01-15 8:37 ` Ding Tianhong
3 siblings, 1 reply; 25+ messages in thread
From: Alexander Graf @ 2015-01-14 10:19 UTC (permalink / raw)
To: Ding Tianhong, arnd, robh+dt, davem, grant.likely
Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
zhangfei.gao, netdev, devicetree, linux
On 14.01.15 07:34, Ding Tianhong wrote:
> v13:
> - Fix the problem of alignment parameters for function and checkpatch warming.
>
> v12:
> - According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
> for hip04 ethernet.
>
> v11:
> - Add ethtool support for tx coalecse getting and setting, the xmit_more
> is not supported for this patch, but I think it could work for hip04,
> will support it later after some tests for performance better.
>
> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>
> - Before:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
>
> - After:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
Using v11 on top of 3.18 and after generating some traffic on the
network as well as compiling code on ahci in parallel I ran into the
following OOPS. Any idea what exactly is going on?
>From a 10000 feet perspective it looks like two problems to me
1) Allocation failure doesn't get handled properly somewhere
2) We fail to allocate with order=0 - I don't see why
Alex
[44085.652301] ld: page allocation failure: order:0, mode:0x120
[44085.671314] CPU: 0 PID: 5121 Comm: ld Not tainted 3.18.0-5-lpae #1
[44085.692110] [<c0335814>] (unwind_backtrace) from [<c032fcc8>]
(show_stack+0x18/0x1c)
[44085.718109] [<c032fcc8>] (show_stack) from [<c0a50afc>]
(dump_stack+0x88/0x98)
[44085.742360] [<c0a50afc>] (dump_stack) from [<c0443f98>]
(warn_alloc_failed+0xdc/0x124)
[44085.768938] [<c0443f98>] (warn_alloc_failed) from [<c04469ec>]
(__alloc_pages_nodemask+0x7ac/0xa8c)
[44085.799305] [<c04469ec>] (__alloc_pages_nodemask) from [<c0959a88>]
(__netdev_alloc_frag+0xac/0x17c)
[44085.829994] [<c0959a88>] (__netdev_alloc_frag) from [<bf00051c>]
(hip04_rx_poll+0x168/0x3cc [hip04_eth])
[44085.861830] [<bf00051c>] (hip04_rx_poll [hip04_eth]) from
[<c096c0fc>] (net_rx_action+0x15c/0x258)
[44085.891897] [<c096c0fc>] (net_rx_action) from [<c0369508>]
(__do_softirq+0x138/0x2dc)
[44085.918171] [<c0369508>] (__do_softirq) from [<c0369920>]
(irq_exit+0x80/0xb8)
[44085.942408] [<c0369920>] (irq_exit) from [<c03b3ee0>]
(__handle_domain_irq+0x68/0xa8)
[44085.968685] [<c03b3ee0>] (__handle_domain_irq) from [<c03086ac>]
(hip04_handle_irq+0x38/0x68)
[44085.997296] [<c03086ac>] (hip04_handle_irq) from [<c0a56d48>]
(__irq_usr+0x48/0x60)
[44086.022977] Exception stack(0xd3fbbfb0 to 0xd3fbbff8)
[44086.039925] bfa0: 03fa94f8
01fc2598 01fc2598 00000000
[44086.067357] bfc0: 03fa9458 40482000 00000000 400878b0 004e46e8
00000040 01f61f80 00000013
[44086.094785] bfe0: fbad2488 bed4c270 403aeb5c 403bbe84 200e0010 ffffffff
[44086.116970] Mem-info:
[44086.124597] DMA per-cpu:
[44086.133098] CPU 0: hi: 186, btch: 31 usd: 191
[44086.149167] HighMem per-cpu:
[44086.158829] CPU 0: hi: 186, btch: 31 usd: 10
[44086.174919] active_anon:78994 inactive_anon:57435 isolated_anon:0
[44086.174919] active_file:721847 inactive_file:559931 isolated_file:0
[44086.174919] unevictable:20 dirty:567 writeback:0 unstable:0
[44086.174919] free:2674014 slab_reclaimable:41072 slab_unreclaimable:5704
[44086.174919] mapped:13107 shmem:42709 pagetables:933 bounce:0
[44086.174919] free_cma:16216
[44086.286552] DMA free:1088kB min:3012kB low:3764kB high:4516kB
active_anon:584kB inactive_anon:2948kB active_file:183224kB
inactive_file:183268kB unevictable:16kB isolated(anon):0kB
isolated(file):0kB present:778240kB managed:569024kB mlocked:16kB
dirty:28kB writeback:0kB mapped:864kB shmem:2948kB
slab_reclaimable:164288kB slab_unreclaimable:22816kB kernel_stack:776kB
pagetables:3732kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[44086.427835] lowmem_reserve[]: 0 0 15624 15624
[44086.442687] HighMem free:10694968kB min:512kB low:21708kB
high:42908kB active_anon:315392kB inactive_anon:226792kB
active_file:2704164kB inactive_file:2056456kB unevictable:64kB
isolated(anon):0kB isolated(file):0kB present:15998976kB
managed:15998976kB mlocked:64kB dirty:2240kB writeback:0kB
mapped:51564kB shmem:167888kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:59318392kB free_cma:64864kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[44086.590672] lowmem_reserve[]: 0 0 0 0
[44086.603158] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB (R) 0*128kB 0*256kB
0*512kB 1*1024kB (R) 0*2048kB 0*4096kB = 1088kB
[44086.639330] HighMem: 6*4kB (UMC) 2*8kB (U) 3*16kB (UMC) 5*32kB (UC)
3*64kB (UMC) 1*128kB (C) 1*256kB (M) 57*512kB (UM) 25*1024kB (UMC)
7*2048kB (UMC) 2594*4096kB (MRC) = 10694968kB
[44086.694225] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[44086.722524] 1324446 total pagecache pages
[44086.735973] 33 pages in swap cache
[44086.747384] Swap cache stats: add 741, delete 708, find 0/0
[44086.766072] Free swap = 497068kB
[44086.777185] Total swap = 500032kB
[44087.151253] 4194304 pages of RAM
[44087.162085] 2674498 free pages
[44087.172327] 52304 reserved pages
[44087.183150] 46573 slab pages
[44087.192810] 1004058 pages shared
[44087.203632] 33 pages swap cached
[44087.217489] Unable to handle kernel paging request at virtual address
b0000000
[44087.241776] pgd = c0303000
[44087.250887] [b0000000] *pgd=80000010306003, *pmd=00000000
[44087.269133] Internal error: Oops: 2a06 [#1] SMP ARM
[44087.285499] Modules linked in: fuse dm_mod uio_pdrv_genirq uio
nls_iso8859_1 nls_cp437 vfat fat sg st sr_mod cdrom af_packet squashfs
loop virtio_blk virtio_ring virtio brd marvell ahci_platform
libahci_platform libahci hip04_mdio gpio_dwapb hip04_eth [last unloaded:
dm_mod]
[44087.368299] CPU: 0 PID: 9 Comm: rcuos/0 Not tainted 3.18.0-5-lpae #1
[44087.389614] task: e88a46c0 ti: e88ac000 task.ti: e88ac000
[44087.407749] PC is at v7_dma_inv_range+0x34/0x4c
[44087.422951] LR is at dma_cache_maint_page+0x9c/0x118
[44087.439609] pc : [<c0340b94>] lr : [<c033a67c>] psr: 400e0013
[44087.439609] sp : e88addc0 ip : c0340c2c fp : 00000002
[44087.478104] r10: 00000000 r9 : c0e73580 r8 : c0e778c4
[44087.495630] r7 : c0fdfb80 r6 : c0f20780 r5 : 00000000 r4 : 00000640
[44087.517524] r3 : 0000003f r2 : 00000040 r1 : b0000640 r0 : b0000000
[44087.539424] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM
Segment kernel
[44087.563939] Control: 30c5387d Table: 21305b40 DAC: fffffffd
[44087.583213] Process rcuos/0 (pid: 9, stack limit = 0xe88ac238)
[44087.602777] Stack: (0xe88addc0 to 0xe88ae000)
[44087.617398] ddc0: 00000000 00000000 e88a4708 e8ae059c e8ae0000
c0e778c4 c0fdfb80 00000640
[44087.644829] dde0: 00000000 df7f9000 00000002 c033a91c c0340c2c
00000700 df9da740 e8ae059c
[44087.672259] de00: e8ae0000 000012ac 00000038 db903700 e8ae02d8
c0fdfb80 00000001 bf000498
[44087.699689] de20: 00000640 00000002 00000000 c0e9c5b0 e88ac000
00000101 c0e7ab04 00000000
[44087.727119] de40: 00000040 bf001e28 c0f77800 e8ae059c 00000000
00000040 0000012c df9da740
[44087.754549] de60: c0e6d100 00000101 e88ac000 c096c0fc c0f245e4
df9da748 007aae3c e88ac000
[44087.781978] de80: c0f26038 c0f2410d e88a5850 00000003 c0e6d08c
00000002 e88ac000 00000002
[44087.809407] dea0: c0f24314 00000101 e1f4d4c0 c0369508 00000000
c0e94fd4 0000000c c0e6d080
[44087.836837] dec0: c0e65448 0000000a c0f34000 c0e6d100 007aae3b
e88ac030 c0a62b84 00208040
[44087.864266] dee0: a00e0013 600e0013 df9d7680 000001ff 00000001
df9d7744 00000000 c0e73630
[44087.891696] df00: e1f4d4c0 c036976c e88ac010 c0369830 db903700
df9d7680 e88ac000 c03bc8cc
[44087.919125] df20: 00000000 d3fedb80 000b7df0 00000000 e88a46c0
c039db48 e88adf38 e88adf38
[44087.946554] df40: 00000000 e8838940 00000000 df9d7680 c03bc74c
00000000 00000000 00000000
[44087.973984] df60: 00000000 c0380398 00000000 00000000 00000001
df9d7680 00000000 00000000
[44088.001413] df80: e88adf80 e88adf80 00000000 00000000 e88adf90
e88adf90 e88adfac e8838940
[44088.028843] dfa0: c03802b8 00000000 00000000 c032bbc8 00000000
00000000 00000000 00000000
[44088.056272] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[44088.083701] dfe0: 00000000 00000000 00000000 00000000 00000013
00000000 e88adff4 00000000
[44088.111150] [<c0340b94>] (v7_dma_inv_range) from [<c033a67c>]
(dma_cache_maint_page+0x9c/0x118)
[44088.140339] [<c033a67c>] (dma_cache_maint_page) from [<c033a91c>]
(__dma_page_dev_to_cpu+0x90/0x110)
[44088.170996] [<c033a91c>] (__dma_page_dev_to_cpu) from [<bf000498>]
(hip04_rx_poll+0xe4/0x3cc [hip04_eth])
[44088.203119] [<bf000498>] (hip04_rx_poll [hip04_eth]) from
[<c096c0fc>] (net_rx_action+0x15c/0x258)
[44088.233185] [<c096c0fc>] (net_rx_action) from [<c0369508>]
(__do_softirq+0x138/0x2dc)
[44088.259457] [<c0369508>] (__do_softirq) from [<c036976c>]
(do_softirq+0x5c/0x64)
[44088.284270] [<c036976c>] (do_softirq) from [<c0369830>]
(__local_bh_enable_ip+0xbc/0xc0)
[44088.311428] [<c0369830>] (__local_bh_enable_ip) from [<c03bc8cc>]
(rcu_nocb_kthread+0x180/0x5bc)
[44088.340910] [<c03bc8cc>] (rcu_nocb_kthread) from [<c0380398>]
(kthread+0xe0/0xf8)
[44088.366026] [<c0380398>] (kthread) from [<c032bbc8>]
(ret_from_fork+0x14/0x20)
[44088.390260] Code: 1e070f3e e1110003 e1c11003 1e071f3e (ee070f36)
[44088.410808] ---[ end trace ccf8b617d193d373 ]---
[44088.426324] Kernel panic - not syncing: Fatal exception in interrupt
[44088.447652] Rebooting in 100 seconds..Reboot failed -- System halted
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
2015-01-14 8:06 ` Joe Perches
2015-01-14 8:53 ` Arnd Bergmann
@ 2015-01-14 16:34 ` Eric Dumazet
[not found] ` <1421253246.11734.17.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-01-15 4:39 ` Joe Perches
` (2 subsequent siblings)
5 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2015-01-14 16:34 UTC (permalink / raw)
To: Ding Tianhong
Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, xuwei5, zhangfei.gao, netdev, devicetree, linux
On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>
> v13: Fix the problem of alignment parameters for function and checkpatch warming.
>
> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
> for hip04 ethernet.
>
> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
> is not supported for this patch, but I think it could work for hip04,
> will support it later after some tests for performance better.
>
> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>
> - Before:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
>
> - After:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
>
> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
> for v9 version
> - drop the workqueue
> - batch cleanup based on tx_coalesce_frames/usecs for better throughput
> - use a reasonable default tx timeout (200us, could be shorted
> based on measurements) with a range timer
> - fix napi poll function return value
> - use a lockless queue for cleanup
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
> drivers/net/ethernet/hisilicon/Makefile | 2 +-
> drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
> 2 files changed, 970 insertions(+), 1 deletion(-)
> create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> index 40115a7..6c14540 100644
> --- a/drivers/net/ethernet/hisilicon/Makefile
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -3,4 +3,4 @@
> #
>
> obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> new file mode 100644
> index 0000000..525214e
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> @@ -0,0 +1,969 @@
> +
> +/* Copyright (c) 2014 Linaro Ltd.
> + * Copyright (c) 2014 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/etherdevice.h>
> +#include <linux/platform_device.h>
> +#include <linux/interrupt.h>
> +#include <linux/ktime.h>
> +#include <linux/of_address.h>
> +#include <linux/phy.h>
> +#include <linux/of_mdio.h>
> +#include <linux/of_net.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/regmap.h>
> +
> +#define PPE_CFG_RX_ADDR 0x100
> +#define PPE_CFG_POOL_GRP 0x300
> +#define PPE_CFG_RX_BUF_SIZE 0x400
> +#define PPE_CFG_RX_FIFO_SIZE 0x500
> +#define PPE_CURR_BUF_CNT 0xa200
> +
> +#define GE_DUPLEX_TYPE 0x08
> +#define GE_MAX_FRM_SIZE_REG 0x3c
> +#define GE_PORT_MODE 0x40
> +#define GE_PORT_EN 0x44
> +#define GE_SHORT_RUNTS_THR_REG 0x50
> +#define GE_TX_LOCAL_PAGE_REG 0x5c
> +#define GE_TRANSMIT_CONTROL_REG 0x60
> +#define GE_CF_CRC_STRIP_REG 0x1b0
> +#define GE_MODE_CHANGE_REG 0x1b4
> +#define GE_RECV_CONTROL_REG 0x1e0
> +#define GE_STATION_MAC_ADDRESS 0x210
> +#define PPE_CFG_CPU_ADD_ADDR 0x580
> +#define PPE_CFG_MAX_FRAME_LEN_REG 0x408
> +#define PPE_CFG_BUS_CTRL_REG 0x424
> +#define PPE_CFG_RX_CTRL_REG 0x428
> +#define PPE_CFG_RX_PKT_MODE_REG 0x438
> +#define PPE_CFG_QOS_VMID_GEN 0x500
> +#define PPE_CFG_RX_PKT_INT 0x538
> +#define PPE_INTEN 0x600
> +#define PPE_INTSTS 0x608
> +#define PPE_RINT 0x604
> +#define PPE_CFG_STS_MODE 0x700
> +#define PPE_HIS_RX_PKT_CNT 0x804
> +
> +/* REG_INTERRUPT */
> +#define RCV_INT BIT(10)
> +#define RCV_NOBUF BIT(8)
> +#define RCV_DROP BIT(7)
> +#define TX_DROP BIT(6)
> +#define DEF_INT_ERR (RCV_NOBUF | RCV_DROP | TX_DROP)
> +#define DEF_INT_MASK (RCV_INT | DEF_INT_ERR)
> +
> +/* TX descriptor config */
> +#define TX_FREE_MEM BIT(0)
> +#define TX_READ_ALLOC_L3 BIT(1)
> +#define TX_FINISH_CACHE_INV BIT(2)
> +#define TX_CLEAR_WB BIT(4)
> +#define TX_L3_CHECKSUM BIT(5)
> +#define TX_LOOP_BACK BIT(11)
> +
> +/* RX error */
> +#define RX_PKT_DROP BIT(0)
> +#define RX_L2_ERR BIT(1)
> +#define RX_PKT_ERR (RX_PKT_DROP | RX_L2_ERR)
> +
> +#define SGMII_SPEED_1000 0x08
> +#define SGMII_SPEED_100 0x07
> +#define SGMII_SPEED_10 0x06
> +#define MII_SPEED_100 0x01
> +#define MII_SPEED_10 0x00
> +
> +#define GE_DUPLEX_FULL BIT(0)
> +#define GE_DUPLEX_HALF 0x00
> +#define GE_MODE_CHANGE_EN BIT(0)
> +
> +#define GE_TX_AUTO_NEG BIT(5)
> +#define GE_TX_ADD_CRC BIT(6)
> +#define GE_TX_SHORT_PAD_THROUGH BIT(7)
> +
> +#define GE_RX_STRIP_CRC BIT(0)
> +#define GE_RX_STRIP_PAD BIT(3)
> +#define GE_RX_PAD_EN BIT(4)
> +
> +#define GE_AUTO_NEG_CTL BIT(0)
> +
> +#define GE_RX_INT_THRESHOLD BIT(6)
> +#define GE_RX_TIMEOUT 0x04
> +
> +#define GE_RX_PORT_EN BIT(1)
> +#define GE_TX_PORT_EN BIT(2)
> +
> +#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12)
> +
> +#define PPE_CFG_RX_PKT_ALIGN BIT(18)
> +#define PPE_CFG_QOS_VMID_MODE BIT(14)
> +#define PPE_CFG_QOS_VMID_GRP_SHIFT 8
> +
> +#define PPE_CFG_RX_FIFO_FSFU BIT(11)
> +#define PPE_CFG_RX_DEPTH_SHIFT 16
> +#define PPE_CFG_RX_START_SHIFT 0
> +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT 11
> +
> +#define PPE_CFG_BUS_LOCAL_REL BIT(14)
> +#define PPE_CFG_BUS_BIG_ENDIEN BIT(0)
> +
> +#define RX_DESC_NUM 128
> +#define TX_DESC_NUM 256
> +#define TX_NEXT(N) (((N) + 1) & (TX_DESC_NUM-1))
> +#define RX_NEXT(N) (((N) + 1) & (RX_DESC_NUM-1))
> +
> +#define GMAC_PPE_RX_PKT_MAX_LEN 379
> +#define GMAC_MAX_PKT_LEN 1516
> +#define GMAC_MIN_PKT_LEN 31
> +#define RX_BUF_SIZE 1600
> +#define RESET_TIMEOUT 1000
> +#define TX_TIMEOUT (6 * HZ)
> +
> +#define DRV_NAME "hip04-ether"
> +#define DRV_VERSION "v1.0"
> +
> +#define HIP04_MAX_TX_COALESCE_USECS 200
> +#define HIP04_MIN_TX_COALESCE_USECS 100
> +#define HIP04_MAX_TX_COALESCE_FRAMES 200
> +#define HIP04_MIN_TX_COALESCE_FRAMES 100
> +
> +struct tx_desc {
> + u32 send_addr;
__be32 send_adddr; ?
> + u32 send_size;
__be32
> + u32 next_addr;
__be32
> + u32 cfg;
__be32
> + u32 wb_addr;
__be32 wb_addr ?
> +} __aligned(64);
> +
> +struct rx_desc {
> + u16 reserved_16;
> + u16 pkt_len;
> + u32 reserve1[3];
> + u32 pkt_err;
> + u32 reserve2[4];
> +};
> +
> +struct hip04_priv {
> + void __iomem *base;
> + int phy_mode;
> + int chan;
> + unsigned int port;
> + unsigned int speed;
> + unsigned int duplex;
> + unsigned int reg_inten;
> +
> + struct napi_struct napi;
> + struct net_device *ndev;
> +
> + struct tx_desc *tx_desc;
> + dma_addr_t tx_desc_dma;
> + struct sk_buff *tx_skb[TX_DESC_NUM];
> + dma_addr_t tx_phys[TX_DESC_NUM];
This is not an efficient way to store skb/phys, as for each skb, info
will be store in 2 separate cache lines.
It would be better to use a
struct hip04_tx_desc {
struct sk_buff *skb;
dma_addr_t phys;
}
> + unsigned int tx_head;
> +
> + int tx_coalesce_frames;
> + int tx_coalesce_usecs;
> + struct hrtimer tx_coalesce_timer;
> +
> + unsigned char *rx_buf[RX_DESC_NUM];
> + dma_addr_t rx_phys[RX_DESC_NUM];
Same thing here : Use a struct to get better data locality.
> + unsigned int rx_head;
> + unsigned int rx_buf_size;
> +
> + struct device_node *phy_node;
> + struct phy_device *phy;
> + struct regmap *map;
> + struct work_struct tx_timeout_task;
> +
> + /* written only by tx cleanup */
> + unsigned int tx_tail ____cacheline_aligned_in_smp;
> +};
> +
> +static inline unsigned int tx_count(unsigned int head, unsigned int tail)
> +{
> + return (head - tail) % (TX_DESC_NUM - 1);
> +}
> +
> +static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + u32 val;
> +
> + priv->speed = speed;
> + priv->duplex = duplex;
> +
> + switch (priv->phy_mode) {
> + case PHY_INTERFACE_MODE_SGMII:
> + if (speed == SPEED_1000)
> + val = SGMII_SPEED_1000;
> + else if (speed == SPEED_100)
> + val = SGMII_SPEED_100;
> + else
> + val = SGMII_SPEED_10;
> + break;
> + case PHY_INTERFACE_MODE_MII:
> + if (speed == SPEED_100)
> + val = MII_SPEED_100;
> + else
> + val = MII_SPEED_10;
> + break;
> + default:
> + netdev_warn(ndev, "not supported mode\n");
> + val = MII_SPEED_10;
> + break;
> + }
> + writel_relaxed(val, priv->base + GE_PORT_MODE);
> +
> + val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
> + writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
> +
> + val = GE_MODE_CHANGE_EN;
> + writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
> +}
> +
> +static void hip04_reset_ppe(struct hip04_priv *priv)
> +{
> + u32 val, tmp, timeout = 0;
> +
> + do {
> + regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
> + regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
> + if (timeout++ > RESET_TIMEOUT)
> + break;
> + } while (val & 0xfff);
> +}
> +
> +static void hip04_config_fifo(struct hip04_priv *priv)
> +{
> + u32 val;
> +
> + val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
> + val |= PPE_CFG_STS_RX_PKT_CNT_RC;
> + writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
> +
> + val = BIT(priv->port);
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
> +
> + val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
> + val |= PPE_CFG_QOS_VMID_MODE;
> + writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
> +
> + val = RX_BUF_SIZE;
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
> +
> + val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
> + val |= PPE_CFG_RX_FIFO_FSFU;
> + val |= priv->chan << PPE_CFG_RX_START_SHIFT;
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
> +
> + val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
> + writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
> +
> + val = PPE_CFG_RX_PKT_ALIGN;
> + writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
> +
> + val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
> + writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
> +
> + val = GMAC_PPE_RX_PKT_MAX_LEN;
> + writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
> +
> + val = GMAC_MAX_PKT_LEN;
> + writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
> +
> + val = GMAC_MIN_PKT_LEN;
> + writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
> +
> + val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
> + val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
> + writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
> +
> + val = GE_RX_STRIP_CRC;
> + writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
> +
> + val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
> + val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
> + writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
> +
> + val = GE_AUTO_NEG_CTL;
> + writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
> +}
> +
> +static void hip04_mac_enable(struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + u32 val;
> +
> + /* enable tx & rx */
> + val = readl_relaxed(priv->base + GE_PORT_EN);
> + val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
> + writel_relaxed(val, priv->base + GE_PORT_EN);
> +
> + /* clear rx int */
> + val = RCV_INT;
> + writel_relaxed(val, priv->base + PPE_RINT);
> +
> + /* config recv int */
> + val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
> + writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
> +
> + /* enable interrupt */
> + priv->reg_inten = DEF_INT_MASK;
> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +}
> +
> +static void hip04_mac_disable(struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + u32 val;
> +
> + /* disable int */
> + priv->reg_inten &= ~(DEF_INT_MASK);
> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> + /* disable tx & rx */
> + val = readl_relaxed(priv->base + GE_PORT_EN);
> + val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
> + writel_relaxed(val, priv->base + GE_PORT_EN);
> +}
> +
> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> + writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
> +}
> +
> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
> +}
> +
> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> +{
> + return readl(priv->base + PPE_HIS_RX_PKT_CNT);
> +}
> +
> +static void hip04_update_mac_address(struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> +
> + writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
> + priv->base + GE_STATION_MAC_ADDRESS);
> + writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
> + (ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
> + priv->base + GE_STATION_MAC_ADDRESS + 4);
> +}
> +
> +static int hip04_set_mac_address(struct net_device *ndev, void *addr)
> +{
> + eth_mac_addr(ndev, addr);
> + hip04_update_mac_address(ndev);
> + return 0;
> +}
> +
> +static int hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + unsigned tx_tail = priv->tx_tail;
> + struct tx_desc *desc;
> + unsigned int bytes_compl = 0, pkts_compl = 0;
> + unsigned int count;
> +
> + smp_rmb();
> + count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
> + if (count == 0)
> + goto out;
> +
> + while (count) {
> + desc = &priv->tx_desc[tx_tail];
> + if (desc->send_addr != 0) {
> + if (force)
> + desc->send_addr = 0;
> + else
> + break;
> + }
> +
> + if (priv->tx_phys[tx_tail]) {
> + dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> + priv->tx_skb[tx_tail]->len,
> + DMA_TO_DEVICE);
> + priv->tx_phys[tx_tail] = 0;
> + }
> + pkts_compl++;
> + bytes_compl += priv->tx_skb[tx_tail]->len;
> + dev_kfree_skb(priv->tx_skb[tx_tail]);
> + priv->tx_skb[tx_tail] = NULL;
> + tx_tail = TX_NEXT(tx_tail);
> + count--;
> + }
> +
> + priv->tx_tail = tx_tail;
> + smp_wmb(); /* Ensure tx_tail visible to xmit */
> +
> +out:
> + if (pkts_compl || bytes_compl)
Testing bytes_compl should be enough : There is no way pkt_compl could
be 0 if bytes_compl is not 0.
> + netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> +
> + if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
> + netif_wake_queue(ndev);
> +
> + return count;
> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + struct net_device_stats *stats = &ndev->stats;
> + unsigned int tx_head = priv->tx_head, count;
> + struct tx_desc *desc = &priv->tx_desc[tx_head];
> + dma_addr_t phys;
> +
> + smp_rmb();
> + count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
> + if (count == (TX_DESC_NUM - 1)) {
> + netif_stop_queue(ndev);
> + return NETDEV_TX_BUSY;
> + }
> +
> + phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> + if (dma_mapping_error(&ndev->dev, phys)) {
> + dev_kfree_skb(skb);
> + return NETDEV_TX_OK;
> + }
> +
> + priv->tx_skb[tx_head] = skb;
> + priv->tx_phys[tx_head] = phys;
> + desc->send_addr = cpu_to_be32(phys);
> + desc->send_size = cpu_to_be32(skb->len);
> + desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
> + phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> + desc->wb_addr = cpu_to_be32(phys);
> + skb_tx_timestamp(skb);
> +
> + hip04_set_xmit_desc(priv, phys);
> + priv->tx_head = TX_NEXT(tx_head);
> + count++;
Starting from this point, skb might already have been freed by TX
completion.
Its racy to access skb->len
> + netdev_sent_queue(ndev, skb->len);
> +
> + stats->tx_bytes += skb->len;
> + stats->tx_packets++;
> +
> + /* Ensure tx_head update visible to tx reclaim */
> + smp_wmb();
> +
> + /* queue is getting full, better start cleaning up now */
> + if (count >= priv->tx_coalesce_frames) {
> + if (napi_schedule_prep(&priv->napi)) {
> + /* disable rx interrupt and timer */
> + priv->reg_inten &= ~(RCV_INT);
> + writel_relaxed(DEF_INT_MASK & ~RCV_INT,
> + priv->base + PPE_INTEN);
> + hrtimer_cancel(&priv->tx_coalesce_timer);
> + __napi_schedule(&priv->napi);
> + }
> + } else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
> + /* cleanup not pending yet, start a new timer */
> + hrtimer_start_expires(&priv->tx_coalesce_timer,
> + HRTIMER_MODE_REL);
> + }
> +
> + return NETDEV_TX_OK;
> +}
> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> + struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
> + struct net_device *ndev = priv->ndev;
> + struct net_device_stats *stats = &ndev->stats;
> + unsigned int cnt = hip04_recv_cnt(priv);
> + struct rx_desc *desc;
> + struct sk_buff *skb;
> + unsigned char *buf;
> + bool last = false;
> + dma_addr_t phys;
> + int rx = 0;
> + int tx_remaining;
> + u16 len;
> + u32 err;
> +
> + while (cnt && !last) {
> + buf = priv->rx_buf[priv->rx_head];
> + skb = build_skb(buf, priv->rx_buf_size);
> + if (unlikely(!skb))
> + net_dbg_ratelimited("build_skb failed\n");
Well, is skb is NULL, you're crashing later...
You really have to address a memory allocation error much better than
that !
> +
> + dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> + RX_BUF_SIZE, DMA_FROM_DEVICE);
> + priv->rx_phys[priv->rx_head] = 0;
> +
> + desc = (struct rx_desc *)skb->data;
> + len = be16_to_cpu(desc->pkt_len);
> + err = be32_to_cpu(desc->pkt_err);
> +
> + if (0 == len) {
> + dev_kfree_skb_any(skb);
> + last = true;
> + } else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
> + dev_kfree_skb_any(skb);
> + stats->rx_dropped++;
> + stats->rx_errors++;
> + } else {
> + skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> + skb_put(skb, len);
> + skb->protocol = eth_type_trans(skb, ndev);
> + napi_gro_receive(&priv->napi, skb);
> + stats->rx_packets++;
> + stats->rx_bytes += len;
> + rx++;
> + }
> +
> + buf = netdev_alloc_frag(priv->rx_buf_size);
> + if (!buf)
> + goto done;
Same problem here : In case of memory allocation error, your driver is
totally screwed.
> + phys = dma_map_single(&ndev->dev, buf,
> + RX_BUF_SIZE, DMA_FROM_DEVICE);
> + if (dma_mapping_error(&ndev->dev, phys))
> + goto done;
Same problem here : You really have to recover properly.
> + priv->rx_buf[priv->rx_head] = buf;
> + priv->rx_phys[priv->rx_head] = phys;
> + hip04_set_recv_desc(priv, phys);
> +
> + priv->rx_head = RX_NEXT(priv->rx_head);
> + if (rx >= budget)
> + goto done;
> +
> + if (--cnt == 0)
> + cnt = hip04_recv_cnt(priv);
> + }
> +
> + if (!(priv->reg_inten & RCV_INT)) {
> + /* enable rx interrupt */
> + priv->reg_inten |= RCV_INT;
> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> + }
> + napi_complete(napi);
> +done:
> + /* clean up tx descriptors and start a new timer if necessary */
> + tx_remaining = hip04_tx_reclaim(ndev, false);
> + if (rx < budget && tx_remaining)
> + hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
> +
> + return rx;
> +}
> +
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 8:53 ` Arnd Bergmann
@ 2015-01-15 2:54 ` Ding Tianhong
2015-01-15 9:05 ` Arnd Bergmann
0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 2:54 UTC (permalink / raw)
To: Arnd Bergmann
Cc: robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On 2015/1/14 16:53, Arnd Bergmann wrote:
> On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
>> +#define HIP04_MAX_TX_COALESCE_USECS 200
>> +#define HIP04_MIN_TX_COALESCE_USECS 100
>> +#define HIP04_MAX_TX_COALESCE_FRAMES 200
>> +#define HIP04_MIN_TX_COALESCE_FRAMES 100
>
> It's not important, but in case you are creating another version of the
> patch, maybe the allowed range can be extended somewhat. The example values
> I picked when I sent my suggestion were really made up. It's great if
> they work fine, but users might want to tune this far more depending on
> their workloads, How about these
>
> #define HIP04_MAX_TX_COALESCE_USECS 100000
> #define HIP04_MIN_TX_COALESCE_USECS 1
> #define HIP04_MAX_TX_COALESCE_FRAMES (TX_DESC_NUM - 1)
> #define HIP04_MIN_TX_COALESCE_FRAMES 1
>
Is it really ok that the so big range may break the driver and hip04 could not work fine?
I am not sure it is ok, I will fix it in next version.
Ding
> Arnd
>
> .
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 8:06 ` Joe Perches
@ 2015-01-15 2:56 ` Ding Tianhong
0 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 2:56 UTC (permalink / raw)
To: Joe Perches
Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On 2015/1/14 16:06, Joe Perches wrote:
> On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>
> Single comment:
>
>> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> []
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
> []
>> + while (cnt && !last) {
>> + buf = priv->rx_buf[priv->rx_head];
>> + skb = build_skb(buf, priv->rx_buf_size);
>> + if (unlikely(!skb))
>> + net_dbg_ratelimited("build_skb failed\n");
>> +
>> + dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
>> + RX_BUF_SIZE, DMA_FROM_DEVICE);
>> + priv->rx_phys[priv->rx_head] = 0;
>> +
>> + desc = (struct rx_desc *)skb->data;
>
> Perhaps if (!skb) is possible, there shouldn't be
> a dereference of that known null here.
>
Yes, we should return and do something to avoid oops, thanks.
Ding
>
>
> .
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
` (2 preceding siblings ...)
2015-01-14 16:34 ` Eric Dumazet
@ 2015-01-15 4:39 ` Joe Perches
[not found] ` <1421296742.25598.3.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
2015-01-15 9:53 ` Arnd Bergmann
2015-01-19 18:11 ` Alexander Graf
5 siblings, 1 reply; 25+ messages in thread
From: Joe Perches @ 2015-01-15 4:39 UTC (permalink / raw)
To: Ding Tianhong
Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
Mostly trivial comments:
> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
[]
> +#define GMAC_MAX_PKT_LEN 1516
[]
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
[]
> + while (cnt && !last) {
[]
> + desc = (struct rx_desc *)skb->data;
> + len = be16_to_cpu(desc->pkt_len);
> + err = be32_to_cpu(desc->pkt_err);
> +
> + if (0 == len) {
> + dev_kfree_skb_any(skb);
> + last = true;
> + } else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
Is this ">=" correct? Maybe it should be ">" ?
[]
> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> + struct net_device *ndev = (struct net_device *)dev_id;
Unnecessary cast of void *
[]
> +static int hip04_set_coalesce(struct net_device *netdev,
> + struct ethtool_coalesce *ec)
> +{
> + struct hip04_priv *priv = netdev_priv(netdev);
> +
> + /* Check not supported parameters */
> + if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
> + (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
> + (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
> + (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
> + (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
> + (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
> + (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
> + (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
> + (ec->tx_max_coalesced_frames_irq) ||
> + (ec->stats_block_coalesce_usecs) ||
> + (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
> + return -EOPNOTSUPP;
Rather than a somewhat haphazard mix of these values,
this might be simpler to read as something like:
/* Check not supported parameters */
if (ec->pkt_rate_low ||
ec->pkt_rate_high ||
ec->use_adaptive_rx_coalesce ||
ec->rx_coalesce_usecs ||
ec->rx_coalesce_usecs_low ||
ec->rx_coalesce_usecs_high ||
ec->rx_coalesce_usecs_irq ||
ec->rx_max_coalesced_frames ||
ec->rx_max_coalesced_frames_low ||
ec->rx_max_coalesced_frames_high ||
ec->rx_max_coalesced_frames_irq ||
ec->use_adaptive_tx_coalesce ||
ec->tx_coalesce_usecs_low ||
ec->tx_coalesce_usecs_high ||
ec->tx_max_coalesced_frames_low ||
ec->tx_max_coalesced_frames_high ||
ec->tx_max_coalesced_frames_irq ||
ec->stats_block_coalesce_usecs ||
ec->rate_sample_interval)
return -EOPNOTSUPP;
> +static void hip04_free_ring(struct net_device *ndev, struct device *d)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + int i;
> +
> + for (i = 0; i < RX_DESC_NUM; i++)
> + if (priv->rx_buf[i])
> + put_page(virt_to_head_page(priv->rx_buf[i]));
It's generally nicer to use braces around
for loops with single ifs.
> +
> + for (i = 0; i < TX_DESC_NUM; i++)
> + if (priv->tx_skb[i])
> + dev_kfree_skb_any(priv->tx_skb[i]);
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
@ 2015-01-15 8:37 ` Ding Tianhong
2015-01-15 9:42 ` Arnd Bergmann
0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 8:37 UTC (permalink / raw)
To: Alexander Graf, arnd, robh+dt, davem, grant.likely
Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
zhangfei.gao, netdev, devicetree, linux
On 2015/1/14 18:19, Alexander Graf wrote:
>
>
> On 14.01.15 07:34, Ding Tianhong wrote:
>> v13:
>> - Fix the problem of alignment parameters for function and checkpatch warming.
>>
>> v12:
>> - According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>> for hip04 ethernet.
>>
>> v11:
>> - Add ethtool support for tx coalecse getting and setting, the xmit_more
>> is not supported for this patch, but I think it could work for hip04,
>> will support it later after some tests for performance better.
>>
>> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>>
>> - Before:
>> $ ping 192.168.1.1 ...
>> === 192.168.1.1 ping statistics ===
>> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>>
>> $ iperf -c 192.168.1.1 ...
>> [ ID] Interval Transfer Bandwidth
>> [ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
>>
>> - After:
>> $ ping 192.168.1.1 ...
>> === 192.168.1.1 ping statistics ===
>> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>>
>> $ iperf -c 192.168.1.1 ...
>> [ ID] Interval Transfer Bandwidth
>> [ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
>
> Using v11 on top of 3.18 and after generating some traffic on the
> network as well as compiling code on ahci in parallel I ran into the
> following OOPS. Any idea what exactly is going on?
>
>>From a 10000 feet perspective it looks like two problems to me
>
> 1) Allocation failure doesn't get handled properly somewhere
> 2) We fail to allocate with order=0 - I don't see why
>
is it easy to repetition this bug? how big is your memory on your board, is it happened in your previous hip04 driver?
ding
>
> Alex
>
>
> [44085.652301] ld: page allocation failure: order:0, mode:0x120
> [44085.671314] CPU: 0 PID: 5121 Comm: ld Not tainted 3.18.0-5-lpae #1
> [44085.692110] [<c0335814>] (unwind_backtrace) from [<c032fcc8>]
> (show_stack+0x18/0x1c)
> [44085.718109] [<c032fcc8>] (show_stack) from [<c0a50afc>]
> (dump_stack+0x88/0x98)
> [44085.742360] [<c0a50afc>] (dump_stack) from [<c0443f98>]
> (warn_alloc_failed+0xdc/0x124)
> [44085.768938] [<c0443f98>] (warn_alloc_failed) from [<c04469ec>]
> (__alloc_pages_nodemask+0x7ac/0xa8c)
> [44085.799305] [<c04469ec>] (__alloc_pages_nodemask) from [<c0959a88>]
> (__netdev_alloc_frag+0xac/0x17c)
> [44085.829994] [<c0959a88>] (__netdev_alloc_frag) from [<bf00051c>]
> (hip04_rx_poll+0x168/0x3cc [hip04_eth])
> [44085.861830] [<bf00051c>] (hip04_rx_poll [hip04_eth]) from
> [<c096c0fc>] (net_rx_action+0x15c/0x258)
> [44085.891897] [<c096c0fc>] (net_rx_action) from [<c0369508>]
> (__do_softirq+0x138/0x2dc)
> [44085.918171] [<c0369508>] (__do_softirq) from [<c0369920>]
> (irq_exit+0x80/0xb8)
> [44085.942408] [<c0369920>] (irq_exit) from [<c03b3ee0>]
> (__handle_domain_irq+0x68/0xa8)
> [44085.968685] [<c03b3ee0>] (__handle_domain_irq) from [<c03086ac>]
> (hip04_handle_irq+0x38/0x68)
> [44085.997296] [<c03086ac>] (hip04_handle_irq) from [<c0a56d48>]
> (__irq_usr+0x48/0x60)
> [44086.022977] Exception stack(0xd3fbbfb0 to 0xd3fbbff8)
> [44086.039925] bfa0: 03fa94f8
> 01fc2598 01fc2598 00000000
> [44086.067357] bfc0: 03fa9458 40482000 00000000 400878b0 004e46e8
> 00000040 01f61f80 00000013
> [44086.094785] bfe0: fbad2488 bed4c270 403aeb5c 403bbe84 200e0010 ffffffff
> [44086.116970] Mem-info:
> [44086.124597] DMA per-cpu:
> [44086.133098] CPU 0: hi: 186, btch: 31 usd: 191
> [44086.149167] HighMem per-cpu:
> [44086.158829] CPU 0: hi: 186, btch: 31 usd: 10
> [44086.174919] active_anon:78994 inactive_anon:57435 isolated_anon:0
> [44086.174919] active_file:721847 inactive_file:559931 isolated_file:0
> [44086.174919] unevictable:20 dirty:567 writeback:0 unstable:0
> [44086.174919] free:2674014 slab_reclaimable:41072 slab_unreclaimable:5704
> [44086.174919] mapped:13107 shmem:42709 pagetables:933 bounce:0
> [44086.174919] free_cma:16216
> [44086.286552] DMA free:1088kB min:3012kB low:3764kB high:4516kB
> active_anon:584kB inactive_anon:2948kB active_file:183224kB
> inactive_file:183268kB unevictable:16kB isolated(anon):0kB
> isolated(file):0kB present:778240kB managed:569024kB mlocked:16kB
> dirty:28kB writeback:0kB mapped:864kB shmem:2948kB
> slab_reclaimable:164288kB slab_unreclaimable:22816kB kernel_stack:776kB
> pagetables:3732kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? no
> [44086.427835] lowmem_reserve[]: 0 0 15624 15624
> [44086.442687] HighMem free:10694968kB min:512kB low:21708kB
> high:42908kB active_anon:315392kB inactive_anon:226792kB
> active_file:2704164kB inactive_file:2056456kB unevictable:64kB
> isolated(anon):0kB isolated(file):0kB present:15998976kB
> managed:15998976kB mlocked:64kB dirty:2240kB writeback:0kB
> mapped:51564kB shmem:167888kB slab_reclaimable:0kB
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:59318392kB free_cma:64864kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> [44086.590672] lowmem_reserve[]: 0 0 0 0
> [44086.603158] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB (R) 0*128kB 0*256kB
> 0*512kB 1*1024kB (R) 0*2048kB 0*4096kB = 1088kB
> [44086.639330] HighMem: 6*4kB (UMC) 2*8kB (U) 3*16kB (UMC) 5*32kB (UC)
> 3*64kB (UMC) 1*128kB (C) 1*256kB (M) 57*512kB (UM) 25*1024kB (UMC)
> 7*2048kB (UMC) 2594*4096kB (MRC) = 10694968kB
> [44086.694225] Node 0 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=2048kB
> [44086.722524] 1324446 total pagecache pages
> [44086.735973] 33 pages in swap cache
> [44086.747384] Swap cache stats: add 741, delete 708, find 0/0
> [44086.766072] Free swap = 497068kB
> [44086.777185] Total swap = 500032kB
> [44087.151253] 4194304 pages of RAM
> [44087.162085] 2674498 free pages
> [44087.172327] 52304 reserved pages
> [44087.183150] 46573 slab pages
> [44087.192810] 1004058 pages shared
> [44087.203632] 33 pages swap cached
> [44087.217489] Unable to handle kernel paging request at virtual address
> b0000000
> [44087.241776] pgd = c0303000
> [44087.250887] [b0000000] *pgd=80000010306003, *pmd=00000000
> [44087.269133] Internal error: Oops: 2a06 [#1] SMP ARM
> [44087.285499] Modules linked in: fuse dm_mod uio_pdrv_genirq uio
> nls_iso8859_1 nls_cp437 vfat fat sg st sr_mod cdrom af_packet squashfs
> loop virtio_blk virtio_ring virtio brd marvell ahci_platform
> libahci_platform libahci hip04_mdio gpio_dwapb hip04_eth [last unloaded:
> dm_mod]
> [44087.368299] CPU: 0 PID: 9 Comm: rcuos/0 Not tainted 3.18.0-5-lpae #1
> [44087.389614] task: e88a46c0 ti: e88ac000 task.ti: e88ac000
> [44087.407749] PC is at v7_dma_inv_range+0x34/0x4c
> [44087.422951] LR is at dma_cache_maint_page+0x9c/0x118
> [44087.439609] pc : [<c0340b94>] lr : [<c033a67c>] psr: 400e0013
> [44087.439609] sp : e88addc0 ip : c0340c2c fp : 00000002
> [44087.478104] r10: 00000000 r9 : c0e73580 r8 : c0e778c4
> [44087.495630] r7 : c0fdfb80 r6 : c0f20780 r5 : 00000000 r4 : 00000640
> [44087.517524] r3 : 0000003f r2 : 00000040 r1 : b0000640 r0 : b0000000
> [44087.539424] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM
> Segment kernel
> [44087.563939] Control: 30c5387d Table: 21305b40 DAC: fffffffd
> [44087.583213] Process rcuos/0 (pid: 9, stack limit = 0xe88ac238)
> [44087.602777] Stack: (0xe88addc0 to 0xe88ae000)
> [44087.617398] ddc0: 00000000 00000000 e88a4708 e8ae059c e8ae0000
> c0e778c4 c0fdfb80 00000640
> [44087.644829] dde0: 00000000 df7f9000 00000002 c033a91c c0340c2c
> 00000700 df9da740 e8ae059c
> [44087.672259] de00: e8ae0000 000012ac 00000038 db903700 e8ae02d8
> c0fdfb80 00000001 bf000498
> [44087.699689] de20: 00000640 00000002 00000000 c0e9c5b0 e88ac000
> 00000101 c0e7ab04 00000000
> [44087.727119] de40: 00000040 bf001e28 c0f77800 e8ae059c 00000000
> 00000040 0000012c df9da740
> [44087.754549] de60: c0e6d100 00000101 e88ac000 c096c0fc c0f245e4
> df9da748 007aae3c e88ac000
> [44087.781978] de80: c0f26038 c0f2410d e88a5850 00000003 c0e6d08c
> 00000002 e88ac000 00000002
> [44087.809407] dea0: c0f24314 00000101 e1f4d4c0 c0369508 00000000
> c0e94fd4 0000000c c0e6d080
> [44087.836837] dec0: c0e65448 0000000a c0f34000 c0e6d100 007aae3b
> e88ac030 c0a62b84 00208040
> [44087.864266] dee0: a00e0013 600e0013 df9d7680 000001ff 00000001
> df9d7744 00000000 c0e73630
> [44087.891696] df00: e1f4d4c0 c036976c e88ac010 c0369830 db903700
> df9d7680 e88ac000 c03bc8cc
> [44087.919125] df20: 00000000 d3fedb80 000b7df0 00000000 e88a46c0
> c039db48 e88adf38 e88adf38
> [44087.946554] df40: 00000000 e8838940 00000000 df9d7680 c03bc74c
> 00000000 00000000 00000000
> [44087.973984] df60: 00000000 c0380398 00000000 00000000 00000001
> df9d7680 00000000 00000000
> [44088.001413] df80: e88adf80 e88adf80 00000000 00000000 e88adf90
> e88adf90 e88adfac e8838940
> [44088.028843] dfa0: c03802b8 00000000 00000000 c032bbc8 00000000
> 00000000 00000000 00000000
> [44088.056272] dfc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [44088.083701] dfe0: 00000000 00000000 00000000 00000000 00000013
> 00000000 e88adff4 00000000
> [44088.111150] [<c0340b94>] (v7_dma_inv_range) from [<c033a67c>]
> (dma_cache_maint_page+0x9c/0x118)
> [44088.140339] [<c033a67c>] (dma_cache_maint_page) from [<c033a91c>]
> (__dma_page_dev_to_cpu+0x90/0x110)
> [44088.170996] [<c033a91c>] (__dma_page_dev_to_cpu) from [<bf000498>]
> (hip04_rx_poll+0xe4/0x3cc [hip04_eth])
> [44088.203119] [<bf000498>] (hip04_rx_poll [hip04_eth]) from
> [<c096c0fc>] (net_rx_action+0x15c/0x258)
> [44088.233185] [<c096c0fc>] (net_rx_action) from [<c0369508>]
> (__do_softirq+0x138/0x2dc)
> [44088.259457] [<c0369508>] (__do_softirq) from [<c036976c>]
> (do_softirq+0x5c/0x64)
> [44088.284270] [<c036976c>] (do_softirq) from [<c0369830>]
> (__local_bh_enable_ip+0xbc/0xc0)
> [44088.311428] [<c0369830>] (__local_bh_enable_ip) from [<c03bc8cc>]
> (rcu_nocb_kthread+0x180/0x5bc)
> [44088.340910] [<c03bc8cc>] (rcu_nocb_kthread) from [<c0380398>]
> (kthread+0xe0/0xf8)
> [44088.366026] [<c0380398>] (kthread) from [<c032bbc8>]
> (ret_from_fork+0x14/0x20)
> [44088.390260] Code: 1e070f3e e1110003 e1c11003 1e071f3e (ee070f36)
> [44088.410808] ---[ end trace ccf8b617d193d373 ]---
> [44088.426324] Kernel panic - not syncing: Fatal exception in interrupt
> [44088.447652] Rebooting in 100 seconds..Reboot failed -- System halted
>
> .
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-15 2:54 ` Ding Tianhong
@ 2015-01-15 9:05 ` Arnd Bergmann
0 siblings, 0 replies; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15 9:05 UTC (permalink / raw)
To: Ding Tianhong
Cc: robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On Thursday 15 January 2015 10:54:24 Ding Tianhong wrote:
> On 2015/1/14 16:53, Arnd Bergmann wrote:
> > On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
> >> +#define HIP04_MAX_TX_COALESCE_USECS 200
> >> +#define HIP04_MIN_TX_COALESCE_USECS 100
> >> +#define HIP04_MAX_TX_COALESCE_FRAMES 200
> >> +#define HIP04_MIN_TX_COALESCE_FRAMES 100
> >
> > It's not important, but in case you are creating another version of the
> > patch, maybe the allowed range can be extended somewhat. The example values
> > I picked when I sent my suggestion were really made up. It's great if
> > they work fine, but users might want to tune this far more depending on
> > their workloads, How about these
> >
> > #define HIP04_MAX_TX_COALESCE_USECS 100000
> > #define HIP04_MIN_TX_COALESCE_USECS 1
> > #define HIP04_MAX_TX_COALESCE_FRAMES (TX_DESC_NUM - 1)
> > #define HIP04_MIN_TX_COALESCE_FRAMES 1
> >
>
> Is it really ok that the so big range may break the driver and hip04 could not work fine?
> I am not sure it is ok, I will fix it in next version.
Obviously, performance will suffer when you pick a bad setting for
a given workload. If the numbers are too low, you will increase
the CPU load but get better latency, so I see nothing wrong in
allowing '1' as the minimum. For the descriptor number, you can't
go above TX_DESC_NUM, but there is nothing wrong in going close to
it, it will just mean that the timer should fire before you get
there and you again get more interrupts.
The 100ms maximum delay is a bit extreme, and will definitely impact
TX latency in some workloads if a user sets this, but the system
will should still be usable, and I couldn't think of a better
limit. Feel free to set this to e.g. 10ms or 1ms if you feel more
comfortable with that.
For reference, with TX_DESC_NUM=256 and 1500 byte frames, you have
up to 300ms worth of data in a full tx queue on a 10mbit link,
or 3ms respectively for a 1gbit link. Because of BQL, the actual
queue length will normally be much shorter than this, but on a
tx-only workload won't shrink below the currently used
tx_coalesce_usecs value.
Arnd
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
2015-01-15 8:37 ` Ding Tianhong
@ 2015-01-15 9:42 ` Arnd Bergmann
2015-01-15 12:39 ` Alexander Graf
0 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15 9:42 UTC (permalink / raw)
To: Ding Tianhong
Cc: Alexander Graf, robh+dt, davem, grant.likely, sergei.shtylyov,
linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On Thursday 15 January 2015 16:37:23 Ding Tianhong wrote:
> On 2015/1/14 18:19, Alexander Graf wrote:
> >
> >>From a 10000 feet perspective it looks like two problems to me
> >
> > 1) Allocation failure doesn't get handled properly somewhere
This is the bug that Eric pointed out as well.
> > 2) We fail to allocate with order=0 - I don't see why
GFP_ATOMIC. When allocating from a the napi poll function in softirq
context, you have to use nonblocking allocations, which occasionally
fail. This should not cause any harm other than dropped packets.
> is it easy to repetition this bug? how big is your memory on your board,
> is it happened in your previous hip04 driver?
It should be independent of memory size, but may be more likely if you
don't have swap space configured.
Arnd
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
` (3 preceding siblings ...)
2015-01-15 4:39 ` Joe Perches
@ 2015-01-15 9:53 ` Arnd Bergmann
2015-01-19 18:11 ` Alexander Graf
5 siblings, 0 replies; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15 9:53 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Ding Tianhong, robh+dt, davem, grant.likely, agraf, devicetree,
linux, sergei.shtylyov, eric.dumazet, netdev, xuwei5,
zhangfei.gao
On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
> +static int hip04_set_coalesce(struct net_device *netdev,
> + struct ethtool_coalesce *ec)
> +{
> + struct hip04_priv *priv = netdev_priv(netdev);
> +
> + /* Check not supported parameters */
> + if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
> + (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
> + (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
> + (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
> + (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
> + (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
> + (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
> + (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
> + (ec->tx_max_coalesced_frames_irq) ||
> + (ec->stats_block_coalesce_usecs) ||
> + (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
> + return -EOPNOTSUPP;
> +
> + if ((ec->tx_coalesce_usecs > HIP04_MAX_TX_COALESCE_USECS ||
> + ec->tx_coalesce_usecs < HIP04_MIN_TX_COALESCE_USECS) ||
> + (ec->tx_max_coalesced_frames > HIP04_MAX_TX_COALESCE_FRAMES ||
> + ec->tx_max_coalesced_frames < HIP04_MIN_TX_COALESCE_FRAMES))
> + return -EINVAL;
> +
> + priv->tx_coalesce_usecs = ec->tx_coalesce_usecs;
> + priv->tx_coalesce_frames = ec->tx_max_coalesced_frames;
> +
> + return 0;
> +}
I just looked at this again and noticed that you fail to actually use the
tx_coalesce_usecs value anywhere, since you don't call hrtimer_set_expires_range()
any more.
Also, I guess it would be nice use the rx_coalesce_usecs_low and
tx_coalesce_usecs_high numbers instead of just a single number, as
hrtimer_set_expires_range takes two values already. Just do something
like
hrtimer_set_expires_range(&priv->tx_coalesce_timer,
priv->rx_coalesce_usecs_low,
priv->rx_coalesce_usecs_high -
priv->rx_coalesce_usecs_low);
and make sure the 'low' value is smaller than the 'high' one.
> + priv->tx_coalesce_frames = TX_DESC_NUM * 3 / 4;
> + priv->tx_coalesce_usecs = 200;
Related, instead of putting the raw numbers here, how about
adding the defaults along with the other macros we talked about:
#define HIP04_MAX_TX_COALESCE_USECS 10000
#define HIP04_MIN_TX_COALESCE_USECS 1
#define HIP04_MAX_TX_COALESCE_FRAMES (TX_DESC_NUM - 1)
#define HIP04_MIN_TX_COALESCE_FRAMES 1
#define HIP04_DEFAULT_TX_COALESCE_USECS_LOW 80
#define HIP04_DEFAULT_TX_COALESCE_USECS_HIGH 200
#define HIP04_DEFAULT_TX_COALESCE_FRAMES (TX_DESC_NUM / 2)
Arnd
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
[not found] ` <1421296742.25598.3.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
@ 2015-01-15 10:28 ` Ding Tianhong
0 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 10:28 UTC (permalink / raw)
To: Joe Perches
Cc: arnd-r2nGTMty4D4, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
agraf-l3A5Bk7waGM,
sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-lFZ/pmaqli7XmaaqVzeoHQ
On 2015/1/15 12:39, Joe Perches wrote:
> On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>
> Mostly trivial comments:
>
>> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
>
> []
>
>> +#define GMAC_MAX_PKT_LEN 1516
>
> []
>
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
> []
>> + while (cnt && !last) {
> []
>> + desc = (struct rx_desc *)skb->data;
>> + len = be16_to_cpu(desc->pkt_len);
>> + err = be32_to_cpu(desc->pkt_err);
>> +
>> + if (0 == len) {
>> + dev_kfree_skb_any(skb);
>> + last = true;
>> + } else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
>
> Is this ">=" correct? Maybe it should be ">" ?
>
> []
>
>> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
>> +{
>> + struct net_device *ndev = (struct net_device *)dev_id;
>
> Unnecessary cast of void *
>
> []
>
>> +static int hip04_set_coalesce(struct net_device *netdev,
>> + struct ethtool_coalesce *ec)
>> +{
>> + struct hip04_priv *priv = netdev_priv(netdev);
>> +
>> + /* Check not supported parameters */
>> + if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
>> + (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
>> + (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
>> + (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
>> + (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
>> + (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
>> + (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
>> + (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
>> + (ec->tx_max_coalesced_frames_irq) ||
>> + (ec->stats_block_coalesce_usecs) ||
>> + (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
>> + return -EOPNOTSUPP;
>
> Rather than a somewhat haphazard mix of these values,
> this might be simpler to read as something like:
>
> /* Check not supported parameters */
> if (ec->pkt_rate_low ||
> ec->pkt_rate_high ||
>
> ec->use_adaptive_rx_coalesce ||
> ec->rx_coalesce_usecs ||
> ec->rx_coalesce_usecs_low ||
> ec->rx_coalesce_usecs_high ||
> ec->rx_coalesce_usecs_irq ||
> ec->rx_max_coalesced_frames ||
> ec->rx_max_coalesced_frames_low ||
> ec->rx_max_coalesced_frames_high ||
> ec->rx_max_coalesced_frames_irq ||
>
> ec->use_adaptive_tx_coalesce ||
> ec->tx_coalesce_usecs_low ||
> ec->tx_coalesce_usecs_high ||
> ec->tx_max_coalesced_frames_low ||
> ec->tx_max_coalesced_frames_high ||
> ec->tx_max_coalesced_frames_irq ||
>
> ec->stats_block_coalesce_usecs ||
> ec->rate_sample_interval)
> return -EOPNOTSUPP;
>
>
>> +static void hip04_free_ring(struct net_device *ndev, struct device *d)
>> +{
>> + struct hip04_priv *priv = netdev_priv(ndev);
>> + int i;
>> +
>> + for (i = 0; i < RX_DESC_NUM; i++)
>> + if (priv->rx_buf[i])
>> + put_page(virt_to_head_page(priv->rx_buf[i]));
>
> It's generally nicer to use braces around
> for loops with single ifs.
>
>> +
>> + for (i = 0; i < TX_DESC_NUM; i++)
>> + if (priv->tx_skb[i])
>> + dev_kfree_skb_any(priv->tx_skb[i]);
>
>
>
OK, fix them later, thanks.
Ding
> .
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
[not found] ` <1421253246.11734.17.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
@ 2015-01-15 10:29 ` Ding Tianhong
2015-01-15 12:30 ` Arnd Bergmann
0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 10:29 UTC (permalink / raw)
To: Eric Dumazet
Cc: arnd-r2nGTMty4D4, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
agraf-l3A5Bk7waGM,
sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-lFZ/pmaqli7XmaaqVzeoHQ
On 2015/1/15 0:34, Eric Dumazet wrote:
> On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>>
>> v13: Fix the problem of alignment parameters for function and checkpatch warming.
>>
>> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>> for hip04 ethernet.
>>
>> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
>> is not supported for this patch, but I think it could work for hip04,
>> will support it later after some tests for performance better.
>>
>> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>>
>> - Before:
>> $ ping 192.168.1.1 ...
>> === 192.168.1.1 ping statistics ===
>> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>>
>> $ iperf -c 192.168.1.1 ...
>> [ ID] Interval Transfer Bandwidth
>> [ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
>>
>> - After:
>> $ ping 192.168.1.1 ...
>> === 192.168.1.1 ping statistics ===
>> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>>
>> $ iperf -c 192.168.1.1 ...
>> [ ID] Interval Transfer Bandwidth
>> [ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
>>
>> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
>> for v9 version
>> - drop the workqueue
>> - batch cleanup based on tx_coalesce_frames/usecs for better throughput
>> - use a reasonable default tx timeout (200us, could be shorted
>> based on measurements) with a range timer
>> - fix napi poll function return value
>> - use a lockless queue for cleanup
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
>> Signed-off-by: Ding Tianhong <dingtianhong-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> ---
>> drivers/net/ethernet/hisilicon/Makefile | 2 +-
>> drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
>> 2 files changed, 970 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>>
>> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
>> index 40115a7..6c14540 100644
>> --- a/drivers/net/ethernet/hisilicon/Makefile
>> +++ b/drivers/net/ethernet/hisilicon/Makefile
>> @@ -3,4 +3,4 @@
>> #
>>
>> obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
>> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
>> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
>> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
>> new file mode 100644
>> index 0000000..525214e
>> --- /dev/null
>> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
>> @@ -0,0 +1,969 @@
>> +
>> +/* Copyright (c) 2014 Linaro Ltd.
>> + * Copyright (c) 2014 Hisilicon Limited.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/etherdevice.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/ktime.h>
>> +#include <linux/of_address.h>
>> +#include <linux/phy.h>
>> +#include <linux/of_mdio.h>
>> +#include <linux/of_net.h>
>> +#include <linux/mfd/syscon.h>
>> +#include <linux/regmap.h>
>> +
>> +#define PPE_CFG_RX_ADDR 0x100
>> +#define PPE_CFG_POOL_GRP 0x300
>> +#define PPE_CFG_RX_BUF_SIZE 0x400
>> +#define PPE_CFG_RX_FIFO_SIZE 0x500
>> +#define PPE_CURR_BUF_CNT 0xa200
>> +
>> +#define GE_DUPLEX_TYPE 0x08
>> +#define GE_MAX_FRM_SIZE_REG 0x3c
>> +#define GE_PORT_MODE 0x40
>> +#define GE_PORT_EN 0x44
>> +#define GE_SHORT_RUNTS_THR_REG 0x50
>> +#define GE_TX_LOCAL_PAGE_REG 0x5c
>> +#define GE_TRANSMIT_CONTROL_REG 0x60
>> +#define GE_CF_CRC_STRIP_REG 0x1b0
>> +#define GE_MODE_CHANGE_REG 0x1b4
>> +#define GE_RECV_CONTROL_REG 0x1e0
>> +#define GE_STATION_MAC_ADDRESS 0x210
>> +#define PPE_CFG_CPU_ADD_ADDR 0x580
>> +#define PPE_CFG_MAX_FRAME_LEN_REG 0x408
>> +#define PPE_CFG_BUS_CTRL_REG 0x424
>> +#define PPE_CFG_RX_CTRL_REG 0x428
>> +#define PPE_CFG_RX_PKT_MODE_REG 0x438
>> +#define PPE_CFG_QOS_VMID_GEN 0x500
>> +#define PPE_CFG_RX_PKT_INT 0x538
>> +#define PPE_INTEN 0x600
>> +#define PPE_INTSTS 0x608
>> +#define PPE_RINT 0x604
>> +#define PPE_CFG_STS_MODE 0x700
>> +#define PPE_HIS_RX_PKT_CNT 0x804
>> +
>> +/* REG_INTERRUPT */
>> +#define RCV_INT BIT(10)
>> +#define RCV_NOBUF BIT(8)
>> +#define RCV_DROP BIT(7)
>> +#define TX_DROP BIT(6)
>> +#define DEF_INT_ERR (RCV_NOBUF | RCV_DROP | TX_DROP)
>> +#define DEF_INT_MASK (RCV_INT | DEF_INT_ERR)
>> +
>> +/* TX descriptor config */
>> +#define TX_FREE_MEM BIT(0)
>> +#define TX_READ_ALLOC_L3 BIT(1)
>> +#define TX_FINISH_CACHE_INV BIT(2)
>> +#define TX_CLEAR_WB BIT(4)
>> +#define TX_L3_CHECKSUM BIT(5)
>> +#define TX_LOOP_BACK BIT(11)
>> +
>> +/* RX error */
>> +#define RX_PKT_DROP BIT(0)
>> +#define RX_L2_ERR BIT(1)
>> +#define RX_PKT_ERR (RX_PKT_DROP | RX_L2_ERR)
>> +
>> +#define SGMII_SPEED_1000 0x08
>> +#define SGMII_SPEED_100 0x07
>> +#define SGMII_SPEED_10 0x06
>> +#define MII_SPEED_100 0x01
>> +#define MII_SPEED_10 0x00
>> +
>> +#define GE_DUPLEX_FULL BIT(0)
>> +#define GE_DUPLEX_HALF 0x00
>> +#define GE_MODE_CHANGE_EN BIT(0)
>> +
>> +#define GE_TX_AUTO_NEG BIT(5)
>> +#define GE_TX_ADD_CRC BIT(6)
>> +#define GE_TX_SHORT_PAD_THROUGH BIT(7)
>> +
>> +#define GE_RX_STRIP_CRC BIT(0)
>> +#define GE_RX_STRIP_PAD BIT(3)
>> +#define GE_RX_PAD_EN BIT(4)
>> +
>> +#define GE_AUTO_NEG_CTL BIT(0)
>> +
>> +#define GE_RX_INT_THRESHOLD BIT(6)
>> +#define GE_RX_TIMEOUT 0x04
>> +
>> +#define GE_RX_PORT_EN BIT(1)
>> +#define GE_TX_PORT_EN BIT(2)
>> +
>> +#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12)
>> +
>> +#define PPE_CFG_RX_PKT_ALIGN BIT(18)
>> +#define PPE_CFG_QOS_VMID_MODE BIT(14)
>> +#define PPE_CFG_QOS_VMID_GRP_SHIFT 8
>> +
>> +#define PPE_CFG_RX_FIFO_FSFU BIT(11)
>> +#define PPE_CFG_RX_DEPTH_SHIFT 16
>> +#define PPE_CFG_RX_START_SHIFT 0
>> +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT 11
>> +
>> +#define PPE_CFG_BUS_LOCAL_REL BIT(14)
>> +#define PPE_CFG_BUS_BIG_ENDIEN BIT(0)
>> +
>> +#define RX_DESC_NUM 128
>> +#define TX_DESC_NUM 256
>> +#define TX_NEXT(N) (((N) + 1) & (TX_DESC_NUM-1))
>> +#define RX_NEXT(N) (((N) + 1) & (RX_DESC_NUM-1))
>> +
>> +#define GMAC_PPE_RX_PKT_MAX_LEN 379
>> +#define GMAC_MAX_PKT_LEN 1516
>> +#define GMAC_MIN_PKT_LEN 31
>> +#define RX_BUF_SIZE 1600
>> +#define RESET_TIMEOUT 1000
>> +#define TX_TIMEOUT (6 * HZ)
>> +
>> +#define DRV_NAME "hip04-ether"
>> +#define DRV_VERSION "v1.0"
>> +
>> +#define HIP04_MAX_TX_COALESCE_USECS 200
>> +#define HIP04_MIN_TX_COALESCE_USECS 100
>> +#define HIP04_MAX_TX_COALESCE_FRAMES 200
>> +#define HIP04_MIN_TX_COALESCE_FRAMES 100
>> +
>> +struct tx_desc {
>> + u32 send_addr;
>
> __be32 send_adddr; ?
>
>> + u32 send_size;
>
> __be32
>
>> + u32 next_addr;
> __be32
>
>> + u32 cfg;
> __be32
>
>> + u32 wb_addr;
> __be32 wb_addr ?
>
>> +} __aligned(64);
>> +
>> +struct rx_desc {
>> + u16 reserved_16;
>> + u16 pkt_len;
>> + u32 reserve1[3];
>> + u32 pkt_err;
>> + u32 reserve2[4];
>> +};
>> +
>> +struct hip04_priv {
>> + void __iomem *base;
>> + int phy_mode;
>> + int chan;
>> + unsigned int port;
>> + unsigned int speed;
>> + unsigned int duplex;
>> + unsigned int reg_inten;
>> +
>> + struct napi_struct napi;
>> + struct net_device *ndev;
>> +
>> + struct tx_desc *tx_desc;
>> + dma_addr_t tx_desc_dma;
>> + struct sk_buff *tx_skb[TX_DESC_NUM];
>> + dma_addr_t tx_phys[TX_DESC_NUM];
>
> This is not an efficient way to store skb/phys, as for each skb, info
> will be store in 2 separate cache lines.
>
> It would be better to use a
>
> struct hip04_tx_desc {
> struct sk_buff *skb;
> dma_addr_t phys;
> }
>
>> + unsigned int tx_head;
>> +
>> + int tx_coalesce_frames;
>> + int tx_coalesce_usecs;
>> + struct hrtimer tx_coalesce_timer;
>> +
>> + unsigned char *rx_buf[RX_DESC_NUM];
>> + dma_addr_t rx_phys[RX_DESC_NUM];
>
> Same thing here : Use a struct to get better data locality.
>
>> + unsigned int rx_head;
>> + unsigned int rx_buf_size;
>> +
>> + struct device_node *phy_node;
>> + struct phy_device *phy;
>> + struct regmap *map;
>> + struct work_struct tx_timeout_task;
>> +
>> + /* written only by tx cleanup */
>> + unsigned int tx_tail ____cacheline_aligned_in_smp;
>> +};
>> +
>> +static inline unsigned int tx_count(unsigned int head, unsigned int tail)
>> +{
>> + return (head - tail) % (TX_DESC_NUM - 1);
>> +}
>> +
>> +static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
>> +{
>> + struct hip04_priv *priv = netdev_priv(ndev);
>> + u32 val;
>> +
>> + priv->speed = speed;
>> + priv->duplex = duplex;
>> +
>> + switch (priv->phy_mode) {
>> + case PHY_INTERFACE_MODE_SGMII:
>> + if (speed == SPEED_1000)
>> + val = SGMII_SPEED_1000;
>> + else if (speed == SPEED_100)
>> + val = SGMII_SPEED_100;
>> + else
>> + val = SGMII_SPEED_10;
>> + break;
>> + case PHY_INTERFACE_MODE_MII:
>> + if (speed == SPEED_100)
>> + val = MII_SPEED_100;
>> + else
>> + val = MII_SPEED_10;
>> + break;
>> + default:
>> + netdev_warn(ndev, "not supported mode\n");
>> + val = MII_SPEED_10;
>> + break;
>> + }
>> + writel_relaxed(val, priv->base + GE_PORT_MODE);
>> +
>> + val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
>> + writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
>> +
>> + val = GE_MODE_CHANGE_EN;
>> + writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
>> +}
>> +
>> +static void hip04_reset_ppe(struct hip04_priv *priv)
>> +{
>> + u32 val, tmp, timeout = 0;
>> +
>> + do {
>> + regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
>> + regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
>> + if (timeout++ > RESET_TIMEOUT)
>> + break;
>> + } while (val & 0xfff);
>> +}
>> +
>> +static void hip04_config_fifo(struct hip04_priv *priv)
>> +{
>> + u32 val;
>> +
>> + val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
>> + val |= PPE_CFG_STS_RX_PKT_CNT_RC;
>> + writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
>> +
>> + val = BIT(priv->port);
>> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
>> +
>> + val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
>> + val |= PPE_CFG_QOS_VMID_MODE;
>> + writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
>> +
>> + val = RX_BUF_SIZE;
>> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
>> +
>> + val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
>> + val |= PPE_CFG_RX_FIFO_FSFU;
>> + val |= priv->chan << PPE_CFG_RX_START_SHIFT;
>> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
>> +
>> + val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
>> + writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
>> +
>> + val = PPE_CFG_RX_PKT_ALIGN;
>> + writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
>> +
>> + val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
>> + writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
>> +
>> + val = GMAC_PPE_RX_PKT_MAX_LEN;
>> + writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
>> +
>> + val = GMAC_MAX_PKT_LEN;
>> + writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
>> +
>> + val = GMAC_MIN_PKT_LEN;
>> + writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
>> +
>> + val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
>> + val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
>> + writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
>> +
>> + val = GE_RX_STRIP_CRC;
>> + writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
>> +
>> + val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
>> + val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
>> + writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
>> +
>> + val = GE_AUTO_NEG_CTL;
>> + writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
>> +}
>> +
>> +static void hip04_mac_enable(struct net_device *ndev)
>> +{
>> + struct hip04_priv *priv = netdev_priv(ndev);
>> + u32 val;
>> +
>> + /* enable tx & rx */
>> + val = readl_relaxed(priv->base + GE_PORT_EN);
>> + val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
>> + writel_relaxed(val, priv->base + GE_PORT_EN);
>> +
>> + /* clear rx int */
>> + val = RCV_INT;
>> + writel_relaxed(val, priv->base + PPE_RINT);
>> +
>> + /* config recv int */
>> + val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
>> + writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
>> +
>> + /* enable interrupt */
>> + priv->reg_inten = DEF_INT_MASK;
>> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +}
>> +
>> +static void hip04_mac_disable(struct net_device *ndev)
>> +{
>> + struct hip04_priv *priv = netdev_priv(ndev);
>> + u32 val;
>> +
>> + /* disable int */
>> + priv->reg_inten &= ~(DEF_INT_MASK);
>> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +
>> + /* disable tx & rx */
>> + val = readl_relaxed(priv->base + GE_PORT_EN);
>> + val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
>> + writel_relaxed(val, priv->base + GE_PORT_EN);
>> +}
>> +
>> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> + writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
>> +}
>> +
>> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
>> +}
>> +
>> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
>> +{
>> + return readl(priv->base + PPE_HIS_RX_PKT_CNT);
>> +}
>> +
>> +static void hip04_update_mac_address(struct net_device *ndev)
>> +{
>> + struct hip04_priv *priv = netdev_priv(ndev);
>> +
>> + writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
>> + priv->base + GE_STATION_MAC_ADDRESS);
>> + writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
>> + (ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
>> + priv->base + GE_STATION_MAC_ADDRESS + 4);
>> +}
>> +
>> +static int hip04_set_mac_address(struct net_device *ndev, void *addr)
>> +{
>> + eth_mac_addr(ndev, addr);
>> + hip04_update_mac_address(ndev);
>> + return 0;
>> +}
>> +
>> +static int hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> + struct hip04_priv *priv = netdev_priv(ndev);
>> + unsigned tx_tail = priv->tx_tail;
>> + struct tx_desc *desc;
>> + unsigned int bytes_compl = 0, pkts_compl = 0;
>> + unsigned int count;
>> +
>> + smp_rmb();
>> + count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
>> + if (count == 0)
>> + goto out;
>> +
>> + while (count) {
>> + desc = &priv->tx_desc[tx_tail];
>> + if (desc->send_addr != 0) {
>> + if (force)
>> + desc->send_addr = 0;
>> + else
>> + break;
>> + }
>> +
>> + if (priv->tx_phys[tx_tail]) {
>> + dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> + priv->tx_skb[tx_tail]->len,
>> + DMA_TO_DEVICE);
>> + priv->tx_phys[tx_tail] = 0;
>> + }
>> + pkts_compl++;
>> + bytes_compl += priv->tx_skb[tx_tail]->len;
>> + dev_kfree_skb(priv->tx_skb[tx_tail]);
>> + priv->tx_skb[tx_tail] = NULL;
>> + tx_tail = TX_NEXT(tx_tail);
>> + count--;
>> + }
>> +
>> + priv->tx_tail = tx_tail;
>> + smp_wmb(); /* Ensure tx_tail visible to xmit */
>> +
>> +out:
>> + if (pkts_compl || bytes_compl)
>
> Testing bytes_compl should be enough : There is no way pkt_compl could
> be 0 if bytes_compl is not 0.
>
>> + netdev_completed_queue(ndev, pkts_compl, bytes_compl);
>> +
>> + if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
>> + netif_wake_queue(ndev);
>> +
>> + return count;
>> +}
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> + struct hip04_priv *priv = netdev_priv(ndev);
>> + struct net_device_stats *stats = &ndev->stats;
>> + unsigned int tx_head = priv->tx_head, count;
>> + struct tx_desc *desc = &priv->tx_desc[tx_head];
>> + dma_addr_t phys;
>> +
>> + smp_rmb();
>> + count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
>> + if (count == (TX_DESC_NUM - 1)) {
>> + netif_stop_queue(ndev);
>> + return NETDEV_TX_BUSY;
>> + }
>> +
>> + phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> + if (dma_mapping_error(&ndev->dev, phys)) {
>> + dev_kfree_skb(skb);
>> + return NETDEV_TX_OK;
>> + }
>> +
>> + priv->tx_skb[tx_head] = skb;
>> + priv->tx_phys[tx_head] = phys;
>> + desc->send_addr = cpu_to_be32(phys);
>> + desc->send_size = cpu_to_be32(skb->len);
>> + desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
>> + phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> + desc->wb_addr = cpu_to_be32(phys);
>> + skb_tx_timestamp(skb);
>> +
>> + hip04_set_xmit_desc(priv, phys);
>> + priv->tx_head = TX_NEXT(tx_head);
>> + count++;
>
> Starting from this point, skb might already have been freed by TX
> completion.
>
> Its racy to access skb->len
>
>> + netdev_sent_queue(ndev, skb->len);
>> +
>> + stats->tx_bytes += skb->len;
>> + stats->tx_packets++;
>> +
>> + /* Ensure tx_head update visible to tx reclaim */
>> + smp_wmb();
>> +
>> + /* queue is getting full, better start cleaning up now */
>> + if (count >= priv->tx_coalesce_frames) {
>> + if (napi_schedule_prep(&priv->napi)) {
>> + /* disable rx interrupt and timer */
>> + priv->reg_inten &= ~(RCV_INT);
>> + writel_relaxed(DEF_INT_MASK & ~RCV_INT,
>> + priv->base + PPE_INTEN);
>> + hrtimer_cancel(&priv->tx_coalesce_timer);
>> + __napi_schedule(&priv->napi);
>> + }
>> + } else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
>> + /* cleanup not pending yet, start a new timer */
>> + hrtimer_start_expires(&priv->tx_coalesce_timer,
>> + HRTIMER_MODE_REL);
>> + }
>> +
>> + return NETDEV_TX_OK;
>> +}
>> +
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> + struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
>> + struct net_device *ndev = priv->ndev;
>> + struct net_device_stats *stats = &ndev->stats;
>> + unsigned int cnt = hip04_recv_cnt(priv);
>> + struct rx_desc *desc;
>> + struct sk_buff *skb;
>> + unsigned char *buf;
>> + bool last = false;
>> + dma_addr_t phys;
>> + int rx = 0;
>> + int tx_remaining;
>> + u16 len;
>> + u32 err;
>> +
>> + while (cnt && !last) {
>> + buf = priv->rx_buf[priv->rx_head];
>> + skb = build_skb(buf, priv->rx_buf_size);
>> + if (unlikely(!skb))
>> + net_dbg_ratelimited("build_skb failed\n");
>
> Well, is skb is NULL, you're crashing later...
> You really have to address a memory allocation error much better than
> that !
>
>> +
>> + dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
>> + RX_BUF_SIZE, DMA_FROM_DEVICE);
>> + priv->rx_phys[priv->rx_head] = 0;
>> +
>> + desc = (struct rx_desc *)skb->data;
>> + len = be16_to_cpu(desc->pkt_len);
>> + err = be32_to_cpu(desc->pkt_err);
>> +
>> + if (0 == len) {
>> + dev_kfree_skb_any(skb);
>> + last = true;
>> + } else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
>> + dev_kfree_skb_any(skb);
>> + stats->rx_dropped++;
>> + stats->rx_errors++;
>> + } else {
>> + skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> + skb_put(skb, len);
>> + skb->protocol = eth_type_trans(skb, ndev);
>> + napi_gro_receive(&priv->napi, skb);
>> + stats->rx_packets++;
>> + stats->rx_bytes += len;
>> + rx++;
>> + }
>> +
>> + buf = netdev_alloc_frag(priv->rx_buf_size);
>> + if (!buf)
>> + goto done;
>
> Same problem here : In case of memory allocation error, your driver is
> totally screwed.
>
>> + phys = dma_map_single(&ndev->dev, buf,
>> + RX_BUF_SIZE, DMA_FROM_DEVICE);
>> + if (dma_mapping_error(&ndev->dev, phys))
>> + goto done;
>
> Same problem here : You really have to recover properly.
>
>> + priv->rx_buf[priv->rx_head] = buf;
>> + priv->rx_phys[priv->rx_head] = phys;
>> + hip04_set_recv_desc(priv, phys);
>> +
>> + priv->rx_head = RX_NEXT(priv->rx_head);
>> + if (rx >= budget)
>> + goto done;
>> +
>> + if (--cnt == 0)
>> + cnt = hip04_recv_cnt(priv);
>> + }
>> +
>> + if (!(priv->reg_inten & RCV_INT)) {
>> + /* enable rx interrupt */
>> + priv->reg_inten |= RCV_INT;
>> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> + }
>> + napi_complete(napi);
>> +done:
>> + /* clean up tx descriptors and start a new timer if necessary */
>> + tx_remaining = hip04_tx_reclaim(ndev, false);
>> + if (rx < budget && tx_remaining)
>> + hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
>> +
>> + return rx;
>> +}
>> +
Yes, thanks, fix them later.
Ding
>
>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-15 10:29 ` Ding Tianhong
@ 2015-01-15 12:30 ` Arnd Bergmann
0 siblings, 0 replies; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15 12:30 UTC (permalink / raw)
To: Ding Tianhong
Cc: Eric Dumazet, robh+dt, davem, grant.likely, agraf,
sergei.shtylyov, linux-arm-kernel, xuwei5, zhangfei.gao, netdev,
devicetree, linux
On Thursday 15 January 2015 18:29:55 Ding Tianhong wrote:
>
> Yes, thanks, fix them later.
Please note that now that as v13 has now shown up in net-next, I guess
all other patches need to be based on top of what is already merged,
and address one issue at a time.
There is one more issue I saw during build testing, which is the
lack of a MODULE_LICENSE statement for the main driver, although
there is one in the mdio driver.
Arnd
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
2015-01-15 9:42 ` Arnd Bergmann
@ 2015-01-15 12:39 ` Alexander Graf
[not found] ` <54B7B4E7.8030108-l3A5Bk7waGM@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Alexander Graf @ 2015-01-15 12:39 UTC (permalink / raw)
To: Arnd Bergmann, Ding Tianhong
Cc: robh+dt, davem, grant.likely, sergei.shtylyov, linux-arm-kernel,
eric.dumazet, xuwei5, zhangfei.gao, netdev, devicetree, linux
On 15.01.15 10:42, Arnd Bergmann wrote:
> On Thursday 15 January 2015 16:37:23 Ding Tianhong wrote:
>> On 2015/1/14 18:19, Alexander Graf wrote:
>>>
>>> >From a 10000 feet perspective it looks like two problems to me
>>>
>>> 1) Allocation failure doesn't get handled properly somewhere
>
> This is the bug that Eric pointed out as well.
>
>>> 2) We fail to allocate with order=0 - I don't see why
>
> GFP_ATOMIC. When allocating from a the napi poll function in softirq
> context, you have to use nonblocking allocations, which occasionally
> fail. This should not cause any harm other than dropped packets.
>
>> is it easy to repetition this bug? how big is your memory on your board,
>> is it happened in your previous hip04 driver?
>
> It should be independent of memory size, but may be more likely if you
> don't have swap space configured.
With the previous driver I was unable to get this far - I ended up in
random memory corruption and had a ~90% packet loss after about an hour
of uptime.
I'm not sure whether it's easy to reproduce, I merely started up a few
VMs, did some disk I/O and started to compile QEMU in the background ;).
I'll happily give your follow up patch that's going to fix the memory
allocation problems a try though.
Alex
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
[not found] ` <54B7B4E7.8030108-l3A5Bk7waGM@public.gmane.org>
@ 2015-01-15 12:56 ` Ding Tianhong
0 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 12:56 UTC (permalink / raw)
To: Alexander Graf, Arnd Bergmann
Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
grant.likely-QSEj5FYQhm4dnm+yROfE0A,
sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-lFZ/pmaqli7XmaaqVzeoHQ
On 2015/1/15 20:39, Alexander Graf wrote:
>
>
> On 15.01.15 10:42, Arnd Bergmann wrote:
>> On Thursday 15 January 2015 16:37:23 Ding Tianhong wrote:
>>> On 2015/1/14 18:19, Alexander Graf wrote:
>>>>
>>>> >From a 10000 feet perspective it looks like two problems to me
>>>>
>>>> 1) Allocation failure doesn't get handled properly somewhere
>>
>> This is the bug that Eric pointed out as well.
>>
>>>> 2) We fail to allocate with order=0 - I don't see why
>>
>> GFP_ATOMIC. When allocating from a the napi poll function in softirq
>> context, you have to use nonblocking allocations, which occasionally
>> fail. This should not cause any harm other than dropped packets.
>>
>>> is it easy to repetition this bug? how big is your memory on your board,
>>> is it happened in your previous hip04 driver?
>>
>> It should be independent of memory size, but may be more likely if you
>> don't have swap space configured.
>
> With the previous driver I was unable to get this far - I ended up in
> random memory corruption and had a ~90% packet loss after about an hour
> of uptime.
>
> I'm not sure whether it's easy to reproduce, I merely started up a few
> VMs, did some disk I/O and started to compile QEMU in the background ;).
>
> I'll happily give your follow up patch that's going to fix the memory
> allocation problems a try though.
>
Thanks for your work, I am sure we could make it much more better:)
Ding
>
> Alex
>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
` (4 preceding siblings ...)
2015-01-15 9:53 ` Arnd Bergmann
@ 2015-01-19 18:11 ` Alexander Graf
[not found] ` <54BD48BF.3050707-l3A5Bk7waGM@public.gmane.org>
5 siblings, 1 reply; 25+ messages in thread
From: Alexander Graf @ 2015-01-19 18:11 UTC (permalink / raw)
To: Ding Tianhong, arnd, robh+dt, davem, grant.likely
Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
zhangfei.gao, netdev, devicetree, linux
On 01/14/15 07:34, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>
> v13: Fix the problem of alignment parameters for function and checkpatch warming.
>
> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
> for hip04 ethernet.
>
> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
> is not supported for this patch, but I think it could work for hip04,
> will support it later after some tests for performance better.
>
> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>
> - Before:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
>
> - After:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
>
> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
> for v9 version
> - drop the workqueue
> - batch cleanup based on tx_coalesce_frames/usecs for better throughput
> - use a reasonable default tx timeout (200us, could be shorted
> based on measurements) with a range timer
> - fix napi poll function return value
> - use a lockless queue for cleanup
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
> drivers/net/ethernet/hisilicon/Makefile | 2 +-
> drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
> 2 files changed, 970 insertions(+), 1 deletion(-)
> create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> index 40115a7..6c14540 100644
> --- a/drivers/net/ethernet/hisilicon/Makefile
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -3,4 +3,4 @@
> #
>
> obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
[...]
> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> + struct net_device *ndev = (struct net_device *)dev_id;
> + struct hip04_priv *priv = netdev_priv(ndev);
> + struct net_device_stats *stats = &ndev->stats;
> + u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
> +
> + if (!ists)
> + return IRQ_NONE;
> +
> + writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
> +
> + if (unlikely(ists & DEF_INT_ERR)) {
> + if (ists & (RCV_NOBUF | RCV_DROP))
> + stats->rx_errors++;
> + stats->rx_dropped++;
> + netdev_err(ndev, "rx drop\n");
After hammering on the box a bit again, I'm in a situation where I get
lots of
[302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
and I really am getting a lot of drops - I can't even ping the machine
anymore.
However, as it is there's a good chance the machine is simply
unreachable because it's busy writing to the UART, and even if not all
useful messages indicating anything have scrolled out. I really don't
think you should emit any message over and over again to the user. Once
or twice is enough.
Please make sure to rate limit it.
Alex
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
[not found] ` <54BD48BF.3050707-l3A5Bk7waGM@public.gmane.org>
@ 2015-01-19 20:34 ` Arnd Bergmann
2015-01-20 2:15 ` Ding Tianhong
0 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-19 20:34 UTC (permalink / raw)
To: Alexander Graf
Cc: Ding Tianhong, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-lFZ/pmaqli7XmaaqVzeoHQ
On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
>
> After hammering on the box a bit again, I'm in a situation where I get
> lots of
>
> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
>
> and I really am getting a lot of drops - I can't even ping the machine
> anymore.
>
> However, as it is there's a good chance the machine is simply
> unreachable because it's busy writing to the UART, and even if not all
> useful messages indicating anything have scrolled out. I really don't
> think you should emit any message over and over again to the user. Once
> or twice is enough.
>
> Please make sure to rate limit it.
I would argue that packet loss is not an error condition at all
and you should not print this at netdev_err() level. You could make
this a netdev_dbg(), or just make it silent because it's already
counted in the statistics.
Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-19 20:34 ` Arnd Bergmann
@ 2015-01-20 2:15 ` Ding Tianhong
[not found] ` <54BDBA29.10703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-20 2:15 UTC (permalink / raw)
To: Arnd Bergmann, Alexander Graf
Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
grant.likely-QSEj5FYQhm4dnm+yROfE0A,
sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-lFZ/pmaqli7XmaaqVzeoHQ
On 2015/1/20 4:34, Arnd Bergmann wrote:
> On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
>>
>> After hammering on the box a bit again, I'm in a situation where I get
>> lots of
>>
>> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
>>
>> and I really am getting a lot of drops - I can't even ping the machine
>> anymore.
>>
>> However, as it is there's a good chance the machine is simply
>> unreachable because it's busy writing to the UART, and even if not all
>> useful messages indicating anything have scrolled out. I really don't
>> think you should emit any message over and over again to the user. Once
>> or twice is enough.
>>
>> Please make sure to rate limit it.
>
> I would argue that packet loss is not an error condition at all
> and you should not print this at netdev_err() level. You could make
> this a netdev_dbg(), or just make it silent because it's already
> counted in the statistics.
>
I think something wrong with Graf's board, I will try to make it happen on my board, and
in any case I will add rate limit to xx_drop and change to dbg log level.
Thanks
Ding
> Arnd
>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
[not found] ` <54BDBA29.10703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2015-01-20 12:01 ` Arnd Bergmann
2015-01-20 18:12 ` Joe Perches
0 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-20 12:01 UTC (permalink / raw)
To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Cc: Ding Tianhong, Alexander Graf, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-lFZ/pmaqli7XmaaqVzeoHQ,
sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
netdev-u79uwXL29TY76Z2rM5mHXA, xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
grant.likely-QSEj5FYQhm4dnm+yROfE0A,
zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q
On Tuesday 20 January 2015 10:15:05 Ding Tianhong wrote:
> On 2015/1/20 4:34, Arnd Bergmann wrote:
> > On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
> >>
> >> After hammering on the box a bit again, I'm in a situation where I get
> >> lots of
> >>
> >> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
> >>
> >> and I really am getting a lot of drops - I can't even ping the machine
> >> anymore.
> >>
> >> However, as it is there's a good chance the machine is simply
> >> unreachable because it's busy writing to the UART, and even if not all
> >> useful messages indicating anything have scrolled out. I really don't
> >> think you should emit any message over and over again to the user. Once
> >> or twice is enough.
> >>
> >> Please make sure to rate limit it.
> >
> > I would argue that packet loss is not an error condition at all
> > and you should not print this at netdev_err() level. You could make
> > this a netdev_dbg(), or just make it silent because it's already
> > counted in the statistics.
> >
>
> I think something wrong with Graf's board, I will try to make it happen on my board, and
> in any case I will add rate limit to xx_drop and change to dbg log level.
I've looked at the interrupt handling in more detail and came up
with this patch. Please review, and forward if you are happy with the
changes.
Alex, could you try if this solves your problem?
8<----
Subject: [PATCH] net/hip04: refactor interrupt masking
The hip04 ethernet driver currently acknowledges all interrupts directly
in the interrupt handler, and leaves all interrupts except the RX data
enabled the whole time. This causes multiple problems:
- When more packets come in between the original interrupt and the
NAPI poll function, we will get an extraneous RX interrupt as soon
as interrupts are enabled again.
- The two error interrupts are by definition combining all errors that
may have happened since the last time they were handled, but just
acknowledging the irq without dealing with the cause of the condition
makes it come back immediately. In particular, when NAPI is intentionally
stalling the rx queue, this causes a storm of "rx dropped" messages.
- The access to the 'reg_inten' field in hip_priv is used for serializing
access, but is in fact racy itself.
To deal with these issues, the driver is changed to only acknowledge
the IRQ at the point when it is being handled. The RX interrupts now get
acked right before reading the number of received frames to keep spurious
interrupts to the absolute minimum without losing interrupts.
As there is no tx-complete interrupt, only an informational tx-dropped
one, we now ack that in the tx reclaim handler, hoping that clearing
the descriptors has also eliminated the error condition.
The only place that reads the reg_inten now just relies on
napi_schedule_prep() to return whether NAPI is active or not, and
the poll() function is then going to ack and reenabled the IRQ if
necessary.
Finally, when we disable interrupts, they are now all disabled
together, in order to simplify the logic and to avoid getting
rx-dropped interrupts when NAPI is in control of the rx queue.
Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 525214ef5984..83773247a368 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -56,6 +56,8 @@
#define RCV_DROP BIT(7)
#define TX_DROP BIT(6)
#define DEF_INT_ERR (RCV_NOBUF | RCV_DROP | TX_DROP)
+#define DEF_INT_RX (RCV_INT | RCV_NOBUF | RCV_DROP)
+#define DEF_INT_TX (TX_DROP)
#define DEF_INT_MASK (RCV_INT | DEF_INT_ERR)
/* TX descriptor config */
@@ -154,7 +156,6 @@ struct hip04_priv {
unsigned int port;
unsigned int speed;
unsigned int duplex;
- unsigned int reg_inten;
struct napi_struct napi;
struct net_device *ndev;
@@ -303,17 +304,15 @@ static void hip04_mac_enable(struct net_device *ndev)
val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
writel_relaxed(val, priv->base + GE_PORT_EN);
- /* clear rx int */
- val = RCV_INT;
- writel_relaxed(val, priv->base + PPE_RINT);
+ /* clear prior interrupts */
+ writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
/* config recv int */
val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
/* enable interrupt */
- priv->reg_inten = DEF_INT_MASK;
- writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+ writel_relaxed(DEF_INT_MASK, priv->base + PPE_INTEN);
}
static void hip04_mac_disable(struct net_device *ndev)
@@ -322,8 +321,7 @@ static void hip04_mac_disable(struct net_device *ndev)
u32 val;
/* disable int */
- priv->reg_inten &= ~(DEF_INT_MASK);
- writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+ writel_relaxed(0, priv->base + PPE_INTEN);
/* disable tx & rx */
val = readl_relaxed(priv->base + GE_PORT_EN);
@@ -403,6 +401,8 @@ static int hip04_tx_reclaim(struct net_device *ndev, bool force)
priv->tx_tail = tx_tail;
smp_wmb(); /* Ensure tx_tail visible to xmit */
+ writel_relaxed(DEF_INT_TX, priv->base + PPE_RINT);
+
out:
if (pkts_compl || bytes_compl)
netdev_completed_queue(ndev, pkts_compl, bytes_compl);
@@ -458,9 +458,7 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
if (count >= priv->tx_coalesce_frames) {
if (napi_schedule_prep(&priv->napi)) {
/* disable rx interrupt and timer */
- priv->reg_inten &= ~(RCV_INT);
- writel_relaxed(DEF_INT_MASK & ~RCV_INT,
- priv->base + PPE_INTEN);
+ writel_relaxed(0, priv->base + PPE_INTEN);
hrtimer_cancel(&priv->tx_coalesce_timer);
__napi_schedule(&priv->napi);
}
@@ -478,7 +476,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
struct net_device *ndev = priv->ndev;
struct net_device_stats *stats = &ndev->stats;
- unsigned int cnt = hip04_recv_cnt(priv);
+ unsigned int cnt;
struct rx_desc *desc;
struct sk_buff *skb;
unsigned char *buf;
@@ -489,6 +487,10 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
u16 len;
u32 err;
+ /* acknowledge interrupts and read status */
+ writel_relaxed(DEF_INT_RX, priv->base + PPE_RINT);
+ cnt = hip04_recv_cnt(priv);
+
while (cnt && !last) {
buf = priv->rx_buf[priv->rx_head];
skb = build_skb(buf, priv->rx_buf_size);
@@ -539,11 +541,8 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
cnt = hip04_recv_cnt(priv);
}
- if (!(priv->reg_inten & RCV_INT)) {
- /* enable rx interrupt */
- priv->reg_inten |= RCV_INT;
- writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
- }
+ /* enable rx interrupt */
+ writel_relaxed(DEF_INT_MASK, priv->base + PPE_INTEN);
napi_complete(napi);
done:
/* clean up tx descriptors and start a new timer if necessary */
@@ -564,23 +563,21 @@ static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
if (!ists)
return IRQ_NONE;
- writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
-
if (unlikely(ists & DEF_INT_ERR)) {
- if (ists & (RCV_NOBUF | RCV_DROP))
+ if (ists & (RCV_NOBUF | RCV_DROP)) {
stats->rx_errors++;
stats->rx_dropped++;
- netdev_err(ndev, "rx drop\n");
+ netdev_dbg(ndev, "rx drop\n");
+ }
if (ists & TX_DROP) {
stats->tx_dropped++;
- netdev_err(ndev, "tx drop\n");
+ netdev_dbg(ndev, "tx drop\n");
}
}
- if (ists & RCV_INT && napi_schedule_prep(&priv->napi)) {
- /* disable rx interrupt */
- priv->reg_inten &= ~(RCV_INT);
- writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+ if (napi_schedule_prep(&priv->napi)) {
+ /* disable interrupt */
+ writel_relaxed(0, priv->base + PPE_INTEN);
hrtimer_cancel(&priv->tx_coalesce_timer);
__napi_schedule(&priv->napi);
}
@@ -596,8 +593,7 @@ enum hrtimer_restart tx_done(struct hrtimer *hrtimer)
if (napi_schedule_prep(&priv->napi)) {
/* disable rx interrupt */
- priv->reg_inten &= ~(RCV_INT);
- writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+ writel_relaxed(0, priv->base + PPE_INTEN);
__napi_schedule(&priv->napi);
}
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
2015-01-20 12:01 ` Arnd Bergmann
@ 2015-01-20 18:12 ` Joe Perches
0 siblings, 0 replies; 25+ messages in thread
From: Joe Perches @ 2015-01-20 18:12 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Ding Tianhong,
Alexander Graf, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-lFZ/pmaqli7XmaaqVzeoHQ,
sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
netdev-u79uwXL29TY76Z2rM5mHXA, xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
grant.likely-QSEj5FYQhm4dnm+yROfE0A,
zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q
On Tue, 2015-01-20 at 13:01 +0100, Arnd Bergmann wrote:
> On Tuesday 20 January 2015 10:15:05 Ding Tianhong wrote:
> > On 2015/1/20 4:34, Arnd Bergmann wrote:
> > > On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
> > >>
> > >> After hammering on the box a bit again, I'm in a situation where I get
> > >> lots of
> > >>
> > >> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
> > >>
> > >> and I really am getting a lot of drops - I can't even ping the machine
> > >> anymore.
> > >>
> > >> However, as it is there's a good chance the machine is simply
> > >> unreachable because it's busy writing to the UART, and even if not all
> > >> useful messages indicating anything have scrolled out. I really don't
> > >> think you should emit any message over and over again to the user. Once
> > >> or twice is enough.
[]
> The hip04 ethernet driver currently acknowledges all interrupts directly
> in the interrupt handler, and leaves all interrupts except the RX data
> enabled the whole time. This causes multiple problems:
[]
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
[]
> @@ -564,23 +563,21 @@ static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> if (!ists)
> return IRQ_NONE;
>
> - writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
> -
> if (unlikely(ists & DEF_INT_ERR)) {
> - if (ists & (RCV_NOBUF | RCV_DROP))
> + if (ists & (RCV_NOBUF | RCV_DROP)) {
> stats->rx_errors++;
> stats->rx_dropped++;
> - netdev_err(ndev, "rx drop\n"
> + netdev_dbg(ndev, "rx drop\n");
> + }
> if (ists & TX_DROP) {
> stats->tx_dropped++;
> - netdev_err(ndev, "tx drop\n");
> + netdev_dbg(ndev, "tx drop\n");
> }
> }
>
While these are dubious messages to output at all, it
probably would benefit to use net_ratelimit() before the
netdev_dbg() and maybe output the counter as well:
if (...) {
stats++
if (net_ratelimit())
netdev_dbg(ndev, "[rt]x drop: %u\n", stats);
}
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2015-01-20 18:12 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-14 6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver Ding Tianhong
2015-01-14 6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
2015-01-14 8:06 ` Joe Perches
2015-01-15 2:56 ` Ding Tianhong
2015-01-14 8:53 ` Arnd Bergmann
2015-01-15 2:54 ` Ding Tianhong
2015-01-15 9:05 ` Arnd Bergmann
2015-01-14 16:34 ` Eric Dumazet
[not found] ` <1421253246.11734.17.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-01-15 10:29 ` Ding Tianhong
2015-01-15 12:30 ` Arnd Bergmann
2015-01-15 4:39 ` Joe Perches
[not found] ` <1421296742.25598.3.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
2015-01-15 10:28 ` Ding Tianhong
2015-01-15 9:53 ` Arnd Bergmann
2015-01-19 18:11 ` Alexander Graf
[not found] ` <54BD48BF.3050707-l3A5Bk7waGM@public.gmane.org>
2015-01-19 20:34 ` Arnd Bergmann
2015-01-20 2:15 ` Ding Tianhong
[not found] ` <54BDBA29.10703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2015-01-20 12:01 ` Arnd Bergmann
2015-01-20 18:12 ` Joe Perches
2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
2015-01-15 8:37 ` Ding Tianhong
2015-01-15 9:42 ` Arnd Bergmann
2015-01-15 12:39 ` Alexander Graf
[not found] ` <54B7B4E7.8030108-l3A5Bk7waGM@public.gmane.org>
2015-01-15 12:56 ` Ding Tianhong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).