Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Nicolas Dichtel @ 2015-01-13  8:29 UTC (permalink / raw)
  To: David Miller
  Cc: tgraf, jesse, stephen, pshelar, therbert, alexei.starovoitov,
	netdev, dev
In-Reply-To: <20150112.125929.266649587719513897.davem@davemloft.net>

Le 12/01/2015 18:59, David Miller a écrit :
>
> Can you PLEASE, PLEASE, not quote and entire full patch just to add two
> lines of commentary.
>
> Quote _only_ the _RELEVANT_ portions of the email you are replying to.
>
Will do, sorry for that.

^ permalink raw reply

* Re: Fwd: [rhashtable] WARNING: CPU: 0 PID: 10 at kernel/locking/mutex.c:570 mutex_lock_nested()
From: Thomas Graf @ 2015-01-13  8:41 UTC (permalink / raw)
  To: Ying Xue; +Cc: linux-kernel, lkp, Netdev
In-Reply-To: <54B4CE3E.8020009@windriver.com>

On 01/13/15 at 03:50pm, Ying Xue wrote:
> On 01/12/2015 08:42 PM, Thomas Graf wrote:
> > On 01/12/15 at 09:38am, Ying Xue wrote:
> >> Hi Thomas,
> >>
> >> I am really unable to see where is wrong leading to below warning
> >> complaints. Can you please help me check it?
> > 
> > Not sure yet. It's not your patch that introduced the issue though.
> > It merely exposed the affected code path.
> >
> > Just wondering, did you test with CONFIG_DEBUG_MUTEXES enabled?
> > 
> > 
> 
> After I enable above option, I don't find similar complaints during my
> testing.

I can't reproduce it in my KVM box either so far. It looks like a
mutex_lock() on an uninitialized mutex or use after free but I can't
find such a code path so far.

^ permalink raw reply

* [PATCH net-next] rhashtable: unnecessary to use delayed work
From: Ying Xue @ 2015-01-13  9:00 UTC (permalink / raw)
  To: tgraf; +Cc: davem, netdev

When we put our declared work task in the global workqueue with
schedule_delayed_work(), its delay parameter is always zero.
Therefore, we should define a normal work in rhashtable structure
instead of a delayed work.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/linux/rhashtable.h |    2 +-
 lib/rhashtable.c           |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 9570832..a2562ed 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -119,7 +119,7 @@ struct rhashtable {
 	atomic_t			nelems;
 	atomic_t			shift;
 	struct rhashtable_params	p;
-	struct delayed_work             run_work;
+	struct work_struct		run_work;
 	struct mutex                    mutex;
 	bool                            being_destroyed;
 };
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index ed6ae1a..a7959ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -476,7 +476,7 @@ static void rht_deferred_worker(struct work_struct *work)
 	struct rhashtable *ht;
 	struct bucket_table *tbl;
 
-	ht = container_of(work, struct rhashtable, run_work.work);
+	ht = container_of(work, struct rhashtable, run_work);
 	mutex_lock(&ht->mutex);
 	tbl = rht_dereference(ht->tbl, ht);
 
@@ -498,7 +498,7 @@ static void rhashtable_wakeup_worker(struct rhashtable *ht)
 	if (tbl == new_tbl &&
 	    ((ht->p.grow_decision && ht->p.grow_decision(ht, size)) ||
 	     (ht->p.shrink_decision && ht->p.shrink_decision(ht, size))))
-		schedule_delayed_work(&ht->run_work, 0);
+		schedule_work(&ht->run_work);
 }
 
 static void __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj,
@@ -894,7 +894,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
 		get_random_bytes(&ht->p.hash_rnd, sizeof(ht->p.hash_rnd));
 
 	if (ht->p.grow_decision || ht->p.shrink_decision)
-		INIT_DEFERRABLE_WORK(&ht->run_work, rht_deferred_worker);
+		INIT_WORK(&ht->run_work, rht_deferred_worker);
 
 	return 0;
 }
@@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht)
 
 	mutex_lock(&ht->mutex);
 
-	cancel_delayed_work(&ht->run_work);
+	cancel_work_sync(&ht->run_work);
 	bucket_table_free(rht_dereference(ht->tbl, ht));
 
 	mutex_unlock(&ht->mutex);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next] tipc: remove redundant timer defined in tipc_sock struct
From: Ying Xue @ 2015-01-13  9:07 UTC (permalink / raw)
  To: davem; +Cc: jon.maloy, netdev, ericalp, Paul.Gortmaker, tipc-discussion

Remove the redundant timer defined in tipc_sock structure, instead we
can directly reuse the sk_timer defined in sock structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 2cec496..c9c34a6 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -69,7 +69,6 @@
  * @pub_count: total # of publications port has made during its lifetime
  * @probing_state:
  * @probing_intv:
- * @timer:
  * @port: port - interacts with 'sk' and with the rest of the TIPC stack
  * @peer_name: the peer of the connection, if any
  * @conn_timeout: the time we can wait for an unresponded setup request
@@ -94,7 +93,6 @@ struct tipc_sock {
 	u32 pub_count;
 	u32 probing_state;
 	unsigned long probing_intv;
-	struct timer_list timer;
 	uint conn_timeout;
 	atomic_t dupl_rcvcnt;
 	bool link_cong;
@@ -360,7 +358,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
 		return -EINVAL;
 	}
 	msg_set_origport(msg, tsk->portid);
-	setup_timer(&tsk->timer, tipc_sk_timeout, (unsigned long)tsk);
+	setup_timer(&sk->sk_timer, tipc_sk_timeout, (unsigned long)tsk);
 	sk->sk_backlog_rcv = tipc_backlog_rcv;
 	sk->sk_rcvbuf = sysctl_tipc_rmem[1];
 	sk->sk_data_ready = tipc_data_ready;
@@ -514,7 +512,8 @@ static int tipc_release(struct socket *sock)
 
 	tipc_sk_withdraw(tsk, 0, NULL);
 	probing_state = tsk->probing_state;
-	if (del_timer_sync(&tsk->timer) && probing_state != TIPC_CONN_PROBING)
+	if (del_timer_sync(&sk->sk_timer) &&
+	    probing_state != TIPC_CONN_PROBING)
 		sock_put(sk);
 	tipc_sk_remove(tsk);
 	if (tsk->connected) {
@@ -1136,7 +1135,8 @@ static int tipc_send_packet(struct kiocb *iocb, struct socket *sock,
 static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
 				u32 peer_node)
 {
-	struct net *net = sock_net(&tsk->sk);
+	struct sock *sk = &tsk->sk;
+	struct net *net = sock_net(sk);
 	struct tipc_msg *msg = &tsk->phdr;
 
 	msg_set_destnode(msg, peer_node);
@@ -1148,8 +1148,7 @@ static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
 	tsk->probing_intv = CONN_PROBING_INTERVAL;
 	tsk->probing_state = TIPC_CONN_OK;
 	tsk->connected = 1;
-	if (!mod_timer(&tsk->timer, jiffies + tsk->probing_intv))
-		sock_hold(&tsk->sk);
+	sk_reset_timer(sk, &sk->sk_timer, jiffies + tsk->probing_intv);
 	tipc_node_add_conn(net, peer_node, tsk->portid, peer_port);
 	tsk->max_pkt = tipc_node_get_mtu(net, peer_node, tsk->portid);
 }
@@ -2141,8 +2140,7 @@ static void tipc_sk_timeout(unsigned long data)
 				      0, peer_node, tn->own_addr,
 				      peer_port, tsk->portid, TIPC_OK);
 		tsk->probing_state = TIPC_CONN_PROBING;
-		if (!mod_timer(&tsk->timer, jiffies + tsk->probing_intv))
-			sock_hold(sk);
+		sk_reset_timer(sk, &sk->sk_timer, jiffies + tsk->probing_intv);
 	}
 	bh_unlock_sock(sk);
 	if (skb)
-- 
1.7.9.5


------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

^ permalink raw reply related

* [PATCH net-next v12 0/3] add hisilicon hip04 ethernet driver
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	agraf-l3A5Bk7waGM
  Cc: sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ

v12:
- According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
  for hip04 ethernet.

v11:
- Add ethtool support for tx coalecse getting and setting, the xmit_more
  is not supported for this patch, but I think it could work for hip04,
  will support it later after some tests for performance better.

  Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
  it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

  - Before:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

  - After:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10:
- According Arnd's suggestion, remove the skb_orphan and use the hrtimer
  for the cleanup of the TX queue and add some modification for the hip04
  drivers.
  1) drop the broken skb_orphan call
  2) drop the workqueue
  3) batch cleanup based on tx_coalesce_frames/usecs for better throughput
  4) use a reasonable default tx timeout (200us, could be shorted
     based on measurements) with a range timer
  5) fix napi poll function return value
  6) use a lockless queue for cleanup

v9:
- There is no tx completion interrupts to free DMAd Tx packets, it means taht
  we rely on new tx packets arriving to run the destructors of completed packets,
  which open up space in their sockets's send queues. Sometimes we don't get such
  new packets causing Tx to stall, a single UDP transmitter is a good example of
  this situation, so we need a clean up workqueue to reclaims completed packets,
  the workqueue will only free the last packets which is already stay for several jiffies.
  Also fix some format cleanups.

v8:
- Use poll to reclaim xmitted buffer as workaround since no tx done interrupt 

v7:
- Remove select NET_CORE in 0002

v6:
- Suggest by Russell: Use netdev_sent_queue & netdev_completed_queue to solve latency issue 
  Also shorten the period of timer, which is used to wakeup the queue since no
  tx completed interrupt.

v5:
- no big change, fix typo

v4:
- Modify accoringly to the suggetion from Arnd, Florian, Eric, David
  Use of_parse_phandle_with_fixed_args & syscon_node_to_regmap get ppe info
  Add skb_orphan() and tx_timer for reclaim since no tx_finished interrupt
  Update timeout, and move of_phy_connect to probe to reuse open/stop

v3:
- Suggest from Arnd, use syscon & regmap_write/read to replace static void __iomem *ppebase.
  Modify hisilicon-hip04-net.txt accrordingly to suggestion from Florian and Sergei.

v2:
- Got many suggestions from Russell, Arnd, Florian, Mark and Sergei
  Remove memcpy, use dma_map/unmap_single, use dma_alloc_coherent rather than dma_pool, etc.
  Refer property in ethernet.txt, change ppe description, etc.

Ding Tianhong (1):
  net: hisilicon: new hip04 ethernet driver

Zhangfei Gao (2):
  Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  net: hisilicon: new hip04 MDIO driver

 .../bindings/net/hisilicon-hip04-net.txt           |  88 ++
 drivers/net/ethernet/hisilicon/Kconfig             |   9 +
 drivers/net/ethernet/hisilicon/Makefile            |   1 +
 drivers/net/ethernet/hisilicon/hip04_eth.c         | 968 +++++++++++++++++++++
 drivers/net/ethernet/hisilicon/hip04_mdio.c        | 186 ++++
 5 files changed, 1252 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v12 2/3] net: hisilicon: new hip04 MDIO driver
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421140290-5492-1-git-send-email-dingtianhong@huawei.com>

From: Zhangfei Gao <zhangfei.gao@linaro.org>

Hisilicon hip04 platform mdio driver
Reuse Marvell phy drivers/net/phy/marvell.c

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/Kconfig      |   9 ++
 drivers/net/ethernet/hisilicon/Makefile     |   1 +
 drivers/net/ethernet/hisilicon/hip04_mdio.c | 186 ++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
index e942173..a54d897 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -24,4 +24,13 @@ config HIX5HD2_GMAC
 	help
 	  This selects the hix5hd2 mac family network device.
 
+config HIP04_ETH
+	tristate "HISILICON P04 Ethernet support"
+	select PHYLIB
+	select MARVELL_PHY
+	select MFD_SYSCON
+	---help---
+	  If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
+	  want to use the internal ethernet then you should answer Y to this.
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 9175e846..40115a7 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
new file mode 100644
index 0000000..b3bac25
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
@@ -0,0 +1,186 @@
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/of_mdio.h>
+#include <linux/delay.h>
+
+#define MDIO_CMD_REG		0x0
+#define MDIO_ADDR_REG		0x4
+#define MDIO_WDATA_REG		0x8
+#define MDIO_RDATA_REG		0xc
+#define MDIO_STA_REG		0x10
+
+#define MDIO_START		BIT(14)
+#define MDIO_R_VALID		BIT(1)
+#define MDIO_READ	        (BIT(12) | BIT(11) | MDIO_START)
+#define MDIO_WRITE	        (BIT(12) | BIT(10) | MDIO_START)
+
+struct hip04_mdio_priv {
+	void __iomem *base;
+};
+
+#define WAIT_TIMEOUT 10
+static int hip04_mdio_wait_ready(struct mii_bus *bus)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	int i;
+
+	for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
+		if (i == WAIT_TIMEOUT)
+			return -ETIMEDOUT;
+		msleep(20);
+	}
+
+	return 0;
+}
+
+static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = regnum | (mii_id << 5) | MDIO_READ;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = readl_relaxed(priv->base + MDIO_STA_REG);
+	if (val & MDIO_R_VALID) {
+		dev_err(bus->parent, "SMI bus read not valid\n");
+		ret = -ENODEV;
+		goto out;
+	}
+
+	val = readl_relaxed(priv->base + MDIO_RDATA_REG);
+	ret = val & 0xFFFF;
+out:
+	return ret;
+}
+
+static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
+			    int regnum, u16 value)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	writel_relaxed(value, priv->base + MDIO_WDATA_REG);
+	val = regnum | (mii_id << 5) | MDIO_WRITE;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+out:
+	return ret;
+}
+
+static int hip04_mdio_reset(struct mii_bus *bus)
+{
+	int temp, i;
+
+	for (i = 0; i < PHY_MAX_ADDR; i++) {
+		hip04_mdio_write(bus, i, 22, 0);
+		temp = hip04_mdio_read(bus, i, MII_BMCR);
+		if (temp < 0)
+			continue;
+
+		temp |= BMCR_RESET;
+		if (hip04_mdio_write(bus, i, MII_BMCR, temp) < 0)
+			continue;
+	}
+
+	mdelay(500);
+	return 0;
+}
+
+static int hip04_mdio_probe(struct platform_device *pdev)
+{
+	struct resource *r;
+	struct mii_bus *bus;
+	struct hip04_mdio_priv *priv;
+	int ret;
+
+	bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
+	if (!bus) {
+		dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
+		return -ENOMEM;
+	}
+
+	bus->name = "hip04_mdio_bus";
+	bus->read = hip04_mdio_read;
+	bus->write = hip04_mdio_write;
+	bus->reset = hip04_mdio_reset;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(&pdev->dev));
+	bus->parent = &pdev->dev;
+	priv = bus->priv;
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(&pdev->dev, r);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto out_mdio;
+	}
+
+	ret = of_mdiobus_register(bus, pdev->dev.of_node);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
+		goto out_mdio;
+	}
+
+	platform_set_drvdata(pdev, bus);
+
+	return 0;
+
+out_mdio:
+	mdiobus_free(bus);
+	return ret;
+}
+
+static int hip04_mdio_remove(struct platform_device *pdev)
+{
+	struct mii_bus *bus = platform_get_drvdata(pdev);
+
+	mdiobus_unregister(bus);
+	mdiobus_free(bus);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mdio_match[] = {
+	{ .compatible = "hisilicon,hip04-mdio" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, hip04_mdio_match);
+
+static struct platform_driver hip04_mdio_driver = {
+	.probe = hip04_mdio_probe,
+	.remove = hip04_mdio_remove,
+	.driver = {
+		.name = "hip04-mdio",
+		.owner = THIS_MODULE,
+		.of_match_table = hip04_mdio_match,
+	},
+};
+
+module_platform_driver(hip04_mdio_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-mdio");
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v12 3/3] net: hisilicon: new hip04 ethernet driver
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421140290-5492-1-git-send-email-dingtianhong@huawei.com>

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
The controller has no tx done interrupt, reclaim xmitted buffer in the poll.

v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
     for hip04 ethernet.

v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
     is not supported for this patch, but I think it could work for hip04,
     will support it later after some tests for performance better.

     Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
     it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

     - Before:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

     - After:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10: According David Miller and Arnd Bergmann's suggestion, add some modification
     for v9 version
     - drop the workqueue
     - batch cleanup based on tx_coalesce_frames/usecs for better throughput
     - use a reasonable default tx timeout (200us, could be shorted
       based on measurements) with a range timer
     - fix napi poll function return value
     - use a lockless queue for cleanup

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/Makefile    |   2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c | 968 +++++++++++++++++++++++++++++
 2 files changed, 969 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 40115a7..6c14540 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,4 +3,4 @@
 #
 
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..a50530d
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,968 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/ktime.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x08
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_REG		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_CPU_ADD_ADDR		0x580
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define RCV_DROP			BIT(7)
+#define TX_DROP				BIT(6)
+#define DEF_INT_ERR			(RCV_NOBUF | RCV_DROP | TX_DROP)
+#define DEF_INT_MASK			(RCV_INT | DEF_INT_ERR)
+
+/* TX descriptor config */
+#define TX_FREE_MEM			BIT(0)
+#define TX_READ_ALLOC_L3		BIT(1)
+#define TX_FINISH_CACHE_INV		BIT(2)
+#define TX_CLEAR_WB			BIT(4)
+#define TX_L3_CHECKSUM			BIT(5)
+#define TX_LOOP_BACK			BIT(11)
+
+/* RX error */
+#define RX_PKT_DROP			BIT(0)
+#define RX_L2_ERR			BIT(1)
+#define RX_PKT_ERR			(RX_PKT_DROP | RX_L2_ERR)
+
+#define SGMII_SPEED_1000		0x08
+#define SGMII_SPEED_100			0x07
+#define SGMII_SPEED_10			0x06
+#define MII_SPEED_100			0x01
+#define MII_SPEED_10			0x00
+
+#define GE_DUPLEX_FULL			BIT(0)
+#define GE_DUPLEX_HALF			0x00
+#define GE_MODE_CHANGE_EN		BIT(0)
+
+#define GE_TX_AUTO_NEG			BIT(5)
+#define GE_TX_ADD_CRC			BIT(6)
+#define GE_TX_SHORT_PAD_THROUGH		BIT(7)
+
+#define GE_RX_STRIP_CRC			BIT(0)
+#define GE_RX_STRIP_PAD			BIT(3)
+#define GE_RX_PAD_EN			BIT(4)
+
+#define GE_AUTO_NEG_CTL			BIT(0)
+
+#define GE_RX_INT_THRESHOLD		BIT(6)
+#define GE_RX_TIMEOUT			0x04
+
+#define GE_RX_PORT_EN			BIT(1)
+#define GE_TX_PORT_EN			BIT(2)
+
+#define PPE_CFG_STS_RX_PKT_CNT_RC	BIT(12)
+
+#define PPE_CFG_RX_PKT_ALIGN		BIT(18)
+#define PPE_CFG_QOS_VMID_MODE		BIT(14)
+#define PPE_CFG_QOS_VMID_GRP_SHIFT	8
+
+#define PPE_CFG_RX_FIFO_FSFU		BIT(11)
+#define PPE_CFG_RX_DEPTH_SHIFT		16
+#define PPE_CFG_RX_START_SHIFT		0
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT	11
+
+#define PPE_CFG_BUS_LOCAL_REL		BIT(14)
+#define PPE_CFG_BUS_BIG_ENDIEN		BIT(0)
+
+#define RX_DESC_NUM			128
+#define TX_DESC_NUM			256
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define GMAC_MIN_PKT_LEN		31
+#define RX_BUF_SIZE			1600
+#define RESET_TIMEOUT			1000
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+#define DRV_VERSION			"v1.0"
+
+#define HIP04_MAX_TX_COALESCE_USECS	200
+#define HIP04_MIN_TX_COALESCE_USECS	100
+#define HIP04_MAX_TX_COALESCE_FRAMES	200
+#define HIP04_MIN_TX_COALESCE_FRAMES	100
+
+struct tx_desc {
+	u32 send_addr;
+	u32 send_size;
+	u32 next_addr;
+	u32 cfg;
+	u32 wb_addr;
+} __aligned(64);
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	unsigned int tx_head;
+
+	int tx_coalesce_frames;
+	int tx_coalesce_usecs;
+	struct hrtimer tx_coalesce_timer;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct work_struct tx_timeout_task;
+
+	/* written only by tx cleanup */
+	unsigned int tx_tail ____cacheline_aligned_in_smp;
+};
+
+static inline unsigned int tx_count(unsigned int head, unsigned int tail)
+{
+	return (head - tail) % (TX_DESC_NUM - 1);
+}
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = SGMII_SPEED_1000;
+		else if (speed == SPEED_100)
+			val = SGMII_SPEED_100;
+		else
+			val = SGMII_SPEED_10;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = MII_SPEED_100;
+		else
+			val = MII_SPEED_10;
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = MII_SPEED_10;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = GE_MODE_CHANGE_EN;
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp, timeout = 0;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+		if (timeout++ > RESET_TIMEOUT)
+			break;
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= PPE_CFG_STS_RX_PKT_CNT_RC;
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
+	val |= PPE_CFG_QOS_VMID_MODE;
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
+	val |= PPE_CFG_RX_FIFO_FSFU;
+	val |= priv->chan << PPE_CFG_RX_START_SHIFT;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	val = PPE_CFG_RX_PKT_ALIGN;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	val = GMAC_PPE_RX_PKT_MAX_LEN;
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	val = GMAC_MAX_PKT_LEN;
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	val = GMAC_MIN_PKT_LEN;
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	val = GE_RX_STRIP_CRC;
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	val = GE_AUTO_NEG_CTL;
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(DEF_INT_MASK);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+		       priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+		       priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static int hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+	unsigned int count;
+
+	smp_rmb();
+	count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
+	if (count == 0)
+		goto out;
+
+	while (count) {
+		desc = &priv->tx_desc[tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+					 priv->tx_skb[tx_tail]->len,
+					 DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		count--;
+	}
+
+	priv->tx_tail = tx_tail;
+	smp_wmb(); /* Ensure tx_tail visible to xmit */
+
+out:
+	if (pkts_compl || bytes_compl)
+		netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
+		netif_wake_queue(ndev);
+
+	return count;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head, count;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	smp_rmb();
+	count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
+	if (count == (TX_DESC_NUM - 1)) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be32(skb->len);
+	desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	count++;
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+
+	/* Ensure tx_head update visible to tx reclaim */
+	smp_wmb();
+
+	/* queue is getting full, better start cleaning up now */
+	if (count >= priv->tx_coalesce_frames) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt and timer */
+			priv->reg_inten &= ~(RCV_INT);
+			writel_relaxed(DEF_INT_MASK & ~RCV_INT,
+				       priv->base + PPE_INTEN);
+			hrtimer_cancel(&priv->tx_coalesce_timer);
+			__napi_schedule(&priv->napi);
+		}
+	} else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
+		/* cleanup not pending yet, start a new timer */
+		hrtimer_start_expires(&priv->tx_coalesce_timer,
+				      HRTIMER_MODE_REL);
+	}
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	int tx_remaining;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				 RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			goto done;
+		phys = dma_map_single(&ndev->dev, buf,
+				      RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			goto done;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			goto done;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (!(priv->reg_inten & RCV_INT)) {
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+	napi_complete(napi);
+done:
+	/* clean up tx descriptors and start a new timer if necessary */
+	tx_remaining = hip04_tx_reclaim(ndev, false);
+	if (rx < budget && tx_remaining)
+		hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *)dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	if (!ists)
+		return IRQ_NONE;
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (unlikely(ists & DEF_INT_ERR)) {
+		if (ists & (RCV_NOBUF | RCV_DROP))
+			stats->rx_errors++;
+			stats->rx_dropped++;
+			netdev_err(ndev, "rx drop\n");
+		if (ists & TX_DROP) {
+			stats->tx_dropped++;
+			netdev_err(ndev, "tx drop\n");
+		}
+	}
+
+	if (ists & RCV_INT && napi_schedule_prep(&priv->napi)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT);
+		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+		hrtimer_cancel(&priv->tx_coalesce_timer);
+		__napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+enum hrtimer_restart tx_done(struct hrtimer *hrtimer)
+{
+	struct hip04_priv *priv;
+	priv = container_of(hrtimer, struct hip04_priv, tx_coalesce_timer);
+
+	if (napi_schedule_prep(&priv->napi)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT);
+		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+		__napi_schedule(&priv->napi);
+	}
+
+	return HRTIMER_NORESTART;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	hip04_reset_ppe(priv);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				      RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netdev_reset_queue(ndev);
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					 RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	schedule_work(&priv->tx_timeout_task);
+}
+
+static void hip04_tx_timeout_task(struct work_struct *work)
+{
+	struct hip04_priv *priv;
+
+	priv = container_of(work, struct hip04_priv, tx_timeout_task);
+	hip04_mac_stop(priv->ndev);
+	hip04_mac_open(priv->ndev);
+}
+
+static struct net_device_stats *hip04_get_stats(struct net_device *ndev)
+{
+	return &ndev->stats;
+}
+
+static int hip04_get_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct hip04_priv *priv = netdev_priv(netdev);
+
+	ec->tx_coalesce_usecs = priv->tx_coalesce_usecs;
+	ec->tx_max_coalesced_frames = priv->tx_coalesce_frames;
+
+	return 0;
+}
+
+static int hip04_set_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct hip04_priv *priv = netdev_priv(netdev);
+
+	/* Check not supported parameters  */
+	if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
+	    (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
+	    (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
+	    (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
+	    (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
+	    (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
+	    (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
+	    (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
+	    (ec->tx_max_coalesced_frames_irq) ||
+	    (ec->stats_block_coalesce_usecs) ||
+	    (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
+		return -EOPNOTSUPP;
+
+	if ((ec->tx_coalesce_usecs > HIP04_MAX_TX_COALESCE_USECS ||
+	     ec->tx_coalesce_usecs < HIP04_MIN_TX_COALESCE_USECS) ||
+	    (ec->tx_max_coalesced_frames > HIP04_MAX_TX_COALESCE_FRAMES ||
+	     ec->tx_max_coalesced_frames < HIP04_MIN_TX_COALESCE_FRAMES))
+		return -EINVAL;
+
+	priv->tx_coalesce_usecs = ec->tx_coalesce_usecs;
+	priv->tx_coalesce_frames = ec->tx_max_coalesced_frames;
+
+	return 0;
+}
+
+static void hip04_get_drvinfo(struct net_device *netdev,
+			      struct ethtool_drvinfo *drvinfo)
+{
+	strlcpy(drvinfo->driver, DRV_NAME, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, DRV_VERSION, sizeof(drvinfo->version));
+}
+
+static struct ethtool_ops hip04_ethtool_ops = {
+	.get_coalesce		= hip04_get_coalesce,
+	.set_coalesce		= hip04_set_coalesce,
+	.get_drvinfo		= hip04_get_drvinfo,
+};
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_get_stats		= hip04_get_stats,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			  priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	ktime_t txtime;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1] * RX_DESC_NUM;
+
+	hrtimer_init(&priv->tx_coalesce_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+	/*
+	 * BQL will try to keep the TX queue as short as possible, but it can't
+	 * be faster than tx_coalesce_usecs, so we need a fast timeout here,
+	 * but also long enough to gather up enough frames to ensure we don't
+	 * get more interrupts than necessary.
+	 * 200us is enough for 16 frames of 1500 bytes at gigabit ethernet rate
+	 */
+	priv->tx_coalesce_frames = TX_DESC_NUM * 3 / 4;
+	priv->tx_coalesce_usecs = 200;
+	/* allow timer to fire after half the time at the earliest */
+	txtime = ktime_set(0, priv->tx_coalesce_usecs * NSEC_PER_USEC / 2);
+	hrtimer_set_expires_range(&priv->tx_coalesce_timer, txtime, txtime);
+	priv->tx_coalesce_timer.function = tx_done;
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+			       0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	INIT_WORK(&priv->tx_timeout_task, hip04_tx_timeout_task);
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->ethtool_ops = &hip04_ethtool_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, NAPI_POLL_WEIGHT);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	cancel_work_sync(&priv->tx_timeout_task);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(of, hip04_mac_match);
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v12 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421140290-5492-1-git-send-email-dingtianhong@huawei.com>

From: Zhangfei Gao <zhangfei.gao@linaro.org>

This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 .../bindings/net/hisilicon-hip04-net.txt           | 88 ++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
new file mode 100644
index 0000000..988fc69
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -0,0 +1,88 @@
+Hisilicon hip04 Ethernet Controller
+
+* Ethernet controller node
+
+Required properties:
+- compatible: should be "hisilicon,hip04-mac".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device.
+- port-handle: <phandle port channel>
+	phandle, specifies a reference to the syscon ppe node
+	port, port number connected to the controller
+	channel, recv channel start from channel * number (RX_DESC_NUM)
+- phy-mode: see ethernet.txt [1].
+
+Optional properties:
+- phy-handle: see ethernet.txt [1].
+
+[1] Documentation/devicetree/bindings/net/ethernet.txt
+
+
+* Ethernet ppe node:
+Control rx & tx fifos of all ethernet controllers.
+Have 2048 recv channels shared by all ethernet controllers, only if no overlap.
+Each controller's recv channel start from channel * number (RX_DESC_NUM).
+
+Required properties:
+- compatible: "hisilicon,hip04-ppe", "syscon".
+- reg: address and length of the register set for the device.
+
+
+* MDIO bus node:
+
+Required properties:
+
+- compatible: should be "hisilicon,hip04-mdio".
+- Inherits from MDIO bus node binding [2]
+[2] Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+	mdio {
+		compatible = "hisilicon,hip04-mdio";
+		reg = <0x28f1000 0x1000>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		phy0: ethernet-phy@0 {
+			compatible = "ethernet-phy-ieee802.3-c22";
+			reg = <0>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+		};
+
+		phy1: ethernet-phy@1 {
+			compatible = "ethernet-phy-ieee802.3-c22";
+			reg = <1>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+		};
+	};
+
+	ppe: ppe@28c0000 {
+		compatible = "hisilicon,hip04-ppe", "syscon";
+		reg = <0x28c0000 0x10000>;
+	};
+
+	fe: ethernet@28b0000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x28b0000 0x10000>;
+		interrupts = <0 413 4>;
+		phy-mode = "mii";
+		port-handle = <&ppe 31 0>;
+	};
+
+	ge0: ethernet@2800000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2800000 0x10000>;
+		interrupts = <0 402 4>;
+		phy-mode = "sgmii";
+		port-handle = <&ppe 0 1>;
+		phy-handle = <&phy0>;
+	};
+
+	ge8: ethernet@2880000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2880000 0x10000>;
+		interrupts = <0 410 4>;
+		phy-mode = "sgmii";
+		port-handle = <&ppe 8 2>;
+		phy-handle = <&phy1>;
+	};
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH net-next] rhashtable: unnecessary to use delayed work
From: Thomas Graf @ 2015-01-13  9:35 UTC (permalink / raw)
  To: Ying Xue; +Cc: davem, netdev
In-Reply-To: <1421139645-1588-1-git-send-email-ying.xue@windriver.com>

On 01/13/15 at 05:00pm, Ying Xue wrote:
> When we put our declared work task in the global workqueue with
> schedule_delayed_work(), its delay parameter is always zero.
> Therefore, we should define a normal work in rhashtable structure
> instead of a delayed work.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>

> @@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht)
>  
>  	mutex_lock(&ht->mutex);
>  
> -	cancel_delayed_work(&ht->run_work);
> +	cancel_work_sync(&ht->run_work);
>  	bucket_table_free(rht_dereference(ht->tbl, ht));
>  
>  	mutex_unlock(&ht->mutex);

I like the patch!

I think it introduces a possible dead lock though (see below). OTOH, it
could actually explain the reason for the 0day lock debug splash that
was reported.

Dead lock: The worker could already have been kicked off but was
interrupted before it acquired ht->mutex. rhashtable_destroy() is
called and acquired ht->mutex. cancel_work_sync() waits for worker to
finish while holding ht->mutex. Worker can't finish because it needs to
acquire ht->mutex to do so.

For the very same reason the reported warning could have been triggered.
Instead of the dead lock, it would have called bucket_table_free()
with a deferred resizer still underway.

What about we do something like this?

void rhashtable_destroy(struct rhashtable *ht)
{
        ht->being_destroyed = true;
	cancel_work_sync(&ht->run_work);

	mutex_lock(&ht->mutex);
	bucket_table_free(rht_dereference(ht->tbl, ht));
	mutex_unlock(&ht->mutex);
}

If you agree we can explain this shortly in the commit message and add:
Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")

^ permalink raw reply

* [PATCH net-next] cxgb4: Ripping out old hard-wired initialization code in driver
From: Hariprasad Shenai @ 2015-01-13  9:49 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, Hariprasad Shenai

Removing old hard-wired initialization code in the driver, which is no longer
used. Also deprecating few module parameters.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  480 +++--------------------
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |   98 +----
 2 files changed, 58 insertions(+), 520 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 23ae0b7..082a596 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -79,99 +79,6 @@
 #define DRV_VERSION "2.0.0-ko"
 #define DRV_DESC "Chelsio T4/T5 Network Driver"
 
-/*
- * Max interrupt hold-off timer value in us.  Queues fall back to this value
- * under extreme memory pressure so it's largish to give the system time to
- * recover.
- */
-#define MAX_SGE_TIMERVAL 200U
-
-enum {
-	/*
-	 * Physical Function provisioning constants.
-	 */
-	PFRES_NVI = 4,			/* # of Virtual Interfaces */
-	PFRES_NETHCTRL = 128,		/* # of EQs used for ETH or CTRL Qs */
-	PFRES_NIQFLINT = 128,		/* # of ingress Qs/w Free List(s)/intr
-					 */
-	PFRES_NEQ = 256,		/* # of egress queues */
-	PFRES_NIQ = 0,			/* # of ingress queues */
-	PFRES_TC = 0,			/* PCI-E traffic class */
-	PFRES_NEXACTF = 128,		/* # of exact MPS filters */
-
-	PFRES_R_CAPS = FW_CMD_CAP_PF,
-	PFRES_WX_CAPS = FW_CMD_CAP_PF,
-
-#ifdef CONFIG_PCI_IOV
-	/*
-	 * Virtual Function provisioning constants.  We need two extra Ingress
-	 * Queues with Interrupt capability to serve as the VF's Firmware
-	 * Event Queue and Forwarded Interrupt Queue (when using MSI mode) --
-	 * neither will have Free Lists associated with them).  For each
-	 * Ethernet/Control Egress Queue and for each Free List, we need an
-	 * Egress Context.
-	 */
-	VFRES_NPORTS = 1,		/* # of "ports" per VF */
-	VFRES_NQSETS = 2,		/* # of "Queue Sets" per VF */
-
-	VFRES_NVI = VFRES_NPORTS,	/* # of Virtual Interfaces */
-	VFRES_NETHCTRL = VFRES_NQSETS,	/* # of EQs used for ETH or CTRL Qs */
-	VFRES_NIQFLINT = VFRES_NQSETS+2,/* # of ingress Qs/w Free List(s)/intr */
-	VFRES_NEQ = VFRES_NQSETS*2,	/* # of egress queues */
-	VFRES_NIQ = 0,			/* # of non-fl/int ingress queues */
-	VFRES_TC = 0,			/* PCI-E traffic class */
-	VFRES_NEXACTF = 16,		/* # of exact MPS filters */
-
-	VFRES_R_CAPS = FW_CMD_CAP_DMAQ|FW_CMD_CAP_VF|FW_CMD_CAP_PORT,
-	VFRES_WX_CAPS = FW_CMD_CAP_DMAQ|FW_CMD_CAP_VF,
-#endif
-};
-
-/*
- * Provide a Port Access Rights Mask for the specified PF/VF.  This is very
- * static and likely not to be useful in the long run.  We really need to
- * implement some form of persistent configuration which the firmware
- * controls.
- */
-static unsigned int pfvfres_pmask(struct adapter *adapter,
-				  unsigned int pf, unsigned int vf)
-{
-	unsigned int portn, portvec;
-
-	/*
-	 * Give PF's access to all of the ports.
-	 */
-	if (vf == 0)
-		return FW_PFVF_CMD_PMASK_M;
-
-	/*
-	 * For VFs, we'll assign them access to the ports based purely on the
-	 * PF.  We assign active ports in order, wrapping around if there are
-	 * fewer active ports than PFs: e.g. active port[pf % nports].
-	 * Unfortunately the adapter's port_info structs haven't been
-	 * initialized yet so we have to compute this.
-	 */
-	if (adapter->params.nports == 0)
-		return 0;
-
-	portn = pf % adapter->params.nports;
-	portvec = adapter->params.portvec;
-	for (;;) {
-		/*
-		 * Isolate the lowest set bit in the port vector.  If we're at
-		 * the port number that we want, return that as the pmask.
-		 * otherwise mask that bit out of the port vector and
-		 * decrement our port number ...
-		 */
-		unsigned int pmask = portvec ^ (portvec & (portvec-1));
-		if (portn == 0)
-			return pmask;
-		portn--;
-		portvec &= ~pmask;
-	}
-	/*NOTREACHED*/
-}
-
 enum {
 	MAX_TXQ_ENTRIES      = 16384,
 	MAX_CTRL_TXQ_ENTRIES = 1024,
@@ -264,7 +171,8 @@ MODULE_PARM_DESC(force_init, "Forcibly become Master PF and initialize adapter")
 static uint force_old_init;
 
 module_param(force_old_init, uint, 0644);
-MODULE_PARM_DESC(force_old_init, "Force old initialization sequence");
+MODULE_PARM_DESC(force_old_init, "Force old initialization sequence, deprecated"
+		 " parameter");
 
 static int dflt_msg_enable = DFLT_MSG_ENABLE;
 
@@ -293,13 +201,14 @@ static unsigned int intr_holdoff[SGE_NTIMERS - 1] = { 5, 10, 20, 50, 100 };
 
 module_param_array(intr_holdoff, uint, NULL, 0644);
 MODULE_PARM_DESC(intr_holdoff, "values for queue interrupt hold-off timers "
-		 "0..4 in microseconds");
+		 "0..4 in microseconds, deprecated parameter");
 
 static unsigned int intr_cnt[SGE_NCOUNTERS - 1] = { 4, 8, 16 };
 
 module_param_array(intr_cnt, uint, NULL, 0644);
 MODULE_PARM_DESC(intr_cnt,
-		 "thresholds 1..3 for queue interrupt packet counters");
+		 "thresholds 1..3 for queue interrupt packet counters, "
+		 "deprecated parameter");
 
 /*
  * Normally we tell the chip to deliver Ingress Packets into our DMA buffers
@@ -319,7 +228,8 @@ static bool vf_acls;
 
 #ifdef CONFIG_PCI_IOV
 module_param(vf_acls, bool, 0644);
-MODULE_PARM_DESC(vf_acls, "if set enable virtualization L2 ACL enforcement");
+MODULE_PARM_DESC(vf_acls, "if set enable virtualization L2 ACL enforcement, "
+		 "deprecated parameter");
 
 /* Configure the number of PCI-E Virtual Function which are to be instantiated
  * on SR-IOV Capable Physical Functions.
@@ -341,32 +251,11 @@ module_param(select_queue, int, 0644);
 MODULE_PARM_DESC(select_queue,
 		 "Select between kernel provided method of selecting or driver method of selecting TX queue. Default is kernel method.");
 
-/*
- * The filter TCAM has a fixed portion and a variable portion.  The fixed
- * portion can match on source/destination IP IPv4/IPv6 addresses and TCP/UDP
- * ports.  The variable portion is 36 bits which can include things like Exact
- * Match MAC Index (9 bits), Ether Type (16 bits), IP Protocol (8 bits),
- * [Inner] VLAN Tag (17 bits), etc. which, if all were somehow selected, would
- * far exceed the 36-bit budget for this "compressed" header portion of the
- * filter.  Thus, we have a scarce resource which must be carefully managed.
- *
- * By default we set this up to mostly match the set of filter matching
- * capabilities of T3 but with accommodations for some of T4's more
- * interesting features:
- *
- *   { IP Fragment (1), MPS Match Type (3), IP Protocol (8),
- *     [Inner] VLAN (17), Port (3), FCoE (1) }
- */
-enum {
-	TP_VLAN_PRI_MAP_DEFAULT = HW_TPL_FR_MT_PR_IV_P_FC,
-	TP_VLAN_PRI_MAP_FIRST = FCOE_S,
-	TP_VLAN_PRI_MAP_LAST = FRAGMENTATION_S,
-};
-
-static unsigned int tp_vlan_pri_map = TP_VLAN_PRI_MAP_DEFAULT;
+static unsigned int tp_vlan_pri_map = HW_TPL_FR_MT_PR_IV_P_FC;
 
 module_param(tp_vlan_pri_map, uint, 0644);
-MODULE_PARM_DESC(tp_vlan_pri_map, "global compressed filter configuration");
+MODULE_PARM_DESC(tp_vlan_pri_map, "global compressed filter configuration, "
+		 "deprecated parameter");
 
 static struct dentry *cxgb4_debugfs_root;
 
@@ -5225,12 +5114,9 @@ static int adap_init0_config(struct adapter *adapter, int reset)
 	if (ret < 0)
 		goto bye;
 
-	/*
-	 * Return successfully and note that we're operating with parameters
-	 * not supplied by the driver, rather than from hard-wired
-	 * initialization constants burried in the driver.
+	/* Emit Firmware Configuration File information and return
+	 * successfully.
 	 */
-	adapter->flags |= USING_SOFT_PARAMS;
 	dev_info(adapter->pdev_dev, "Successfully configured using Firmware "\
 		 "Configuration File \"%s\", version %#x, computed checksum %#x\n",
 		 config_name, finiver, cfcsum);
@@ -5248,248 +5134,6 @@ bye:
 	return ret;
 }
 
-/*
- * Attempt to initialize the adapter via hard-coded, driver supplied
- * parameters ...
- */
-static int adap_init0_no_config(struct adapter *adapter, int reset)
-{
-	struct sge *s = &adapter->sge;
-	struct fw_caps_config_cmd caps_cmd;
-	u32 v;
-	int i, ret;
-
-	/*
-	 * Reset device if necessary
-	 */
-	if (reset) {
-		ret = t4_fw_reset(adapter, adapter->mbox,
-				  PIORSTMODE_F | PIORST_F);
-		if (ret < 0)
-			goto bye;
-	}
-
-	/*
-	 * Get device capabilities and select which we'll be using.
-	 */
-	memset(&caps_cmd, 0, sizeof(caps_cmd));
-	caps_cmd.op_to_write = htonl(FW_CMD_OP_V(FW_CAPS_CONFIG_CMD) |
-				     FW_CMD_REQUEST_F | FW_CMD_READ_F);
-	caps_cmd.cfvalid_to_len16 = htonl(FW_LEN16(caps_cmd));
-	ret = t4_wr_mbox(adapter, adapter->mbox, &caps_cmd, sizeof(caps_cmd),
-			 &caps_cmd);
-	if (ret < 0)
-		goto bye;
-
-	if (caps_cmd.niccaps & htons(FW_CAPS_CONFIG_NIC_VM)) {
-		if (!vf_acls)
-			caps_cmd.niccaps ^= htons(FW_CAPS_CONFIG_NIC_VM);
-		else
-			caps_cmd.niccaps = htons(FW_CAPS_CONFIG_NIC_VM);
-	} else if (vf_acls) {
-		dev_err(adapter->pdev_dev, "virtualization ACLs not supported");
-		goto bye;
-	}
-	caps_cmd.op_to_write = htonl(FW_CMD_OP_V(FW_CAPS_CONFIG_CMD) |
-			      FW_CMD_REQUEST_F | FW_CMD_WRITE_F);
-	ret = t4_wr_mbox(adapter, adapter->mbox, &caps_cmd, sizeof(caps_cmd),
-			 NULL);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Tweak configuration based on system architecture, module
-	 * parameters, etc.
-	 */
-	ret = adap_init0_tweaks(adapter);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Select RSS Global Mode we want to use.  We use "Basic Virtual"
-	 * mode which maps each Virtual Interface to its own section of
-	 * the RSS Table and we turn on all map and hash enables ...
-	 */
-	adapter->flags |= RSS_TNLALLLOOKUP;
-	ret = t4_config_glbl_rss(adapter, adapter->mbox,
-				 FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL,
-				 FW_RSS_GLB_CONFIG_CMD_TNLMAPEN_F |
-				 FW_RSS_GLB_CONFIG_CMD_HASHTOEPLITZ_F |
-				 ((adapter->flags & RSS_TNLALLLOOKUP) ?
-					FW_RSS_GLB_CONFIG_CMD_TNLALLLKP_F : 0));
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Set up our own fundamental resource provisioning ...
-	 */
-	ret = t4_cfg_pfvf(adapter, adapter->mbox, adapter->fn, 0,
-			  PFRES_NEQ, PFRES_NETHCTRL,
-			  PFRES_NIQFLINT, PFRES_NIQ,
-			  PFRES_TC, PFRES_NVI,
-			  FW_PFVF_CMD_CMASK_M,
-			  pfvfres_pmask(adapter, adapter->fn, 0),
-			  PFRES_NEXACTF,
-			  PFRES_R_CAPS, PFRES_WX_CAPS);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Perform low level SGE initialization.  We need to do this before we
-	 * send the firmware the INITIALIZE command because that will cause
-	 * any other PF Drivers which are waiting for the Master
-	 * Initialization to proceed forward.
-	 */
-	for (i = 0; i < SGE_NTIMERS - 1; i++)
-		s->timer_val[i] = min(intr_holdoff[i], MAX_SGE_TIMERVAL);
-	s->timer_val[SGE_NTIMERS - 1] = MAX_SGE_TIMERVAL;
-	s->counter_val[0] = 1;
-	for (i = 1; i < SGE_NCOUNTERS; i++)
-		s->counter_val[i] = min(intr_cnt[i - 1], THRESHOLD_0_M);
-	t4_sge_init(adapter);
-
-#ifdef CONFIG_PCI_IOV
-	/*
-	 * Provision resource limits for Virtual Functions.  We currently
-	 * grant them all the same static resource limits except for the Port
-	 * Access Rights Mask which we're assigning based on the PF.  All of
-	 * the static provisioning stuff for both the PF and VF really needs
-	 * to be managed in a persistent manner for each device which the
-	 * firmware controls.
-	 */
-	{
-		int pf, vf;
-
-		for (pf = 0; pf < ARRAY_SIZE(num_vf); pf++) {
-			if (num_vf[pf] <= 0)
-				continue;
-
-			/* VF numbering starts at 1! */
-			for (vf = 1; vf <= num_vf[pf]; vf++) {
-				ret = t4_cfg_pfvf(adapter, adapter->mbox,
-						  pf, vf,
-						  VFRES_NEQ, VFRES_NETHCTRL,
-						  VFRES_NIQFLINT, VFRES_NIQ,
-						  VFRES_TC, VFRES_NVI,
-						  FW_PFVF_CMD_CMASK_M,
-						  pfvfres_pmask(
-						  adapter, pf, vf),
-						  VFRES_NEXACTF,
-						  VFRES_R_CAPS, VFRES_WX_CAPS);
-				if (ret < 0)
-					dev_warn(adapter->pdev_dev,
-						 "failed to "\
-						 "provision pf/vf=%d/%d; "
-						 "err=%d\n", pf, vf, ret);
-			}
-		}
-	}
-#endif
-
-	/*
-	 * Set up the default filter mode.  Later we'll want to implement this
-	 * via a firmware command, etc. ...  This needs to be done before the
-	 * firmare initialization command ...  If the selected set of fields
-	 * isn't equal to the default value, we'll need to make sure that the
-	 * field selections will fit in the 36-bit budget.
-	 */
-	if (tp_vlan_pri_map != TP_VLAN_PRI_MAP_DEFAULT) {
-		int j, bits = 0;
-
-		for (j = TP_VLAN_PRI_MAP_FIRST; j <= TP_VLAN_PRI_MAP_LAST; j++)
-			switch (tp_vlan_pri_map & (1 << j)) {
-			case 0:
-				/* compressed filter field not enabled */
-				break;
-			case FCOE_F:
-				bits +=  1;
-				break;
-			case PORT_F:
-				bits +=  3;
-				break;
-			case VNIC_F:
-				bits += 17;
-				break;
-			case VLAN_F:
-				bits += 17;
-				break;
-			case TOS_F:
-				bits +=  8;
-				break;
-			case PROTOCOL_F:
-				bits +=  8;
-				break;
-			case ETHERTYPE_F:
-				bits += 16;
-				break;
-			case MACMATCH_F:
-				bits +=  9;
-				break;
-			case MPSHITTYPE_F:
-				bits +=  3;
-				break;
-			case FRAGMENTATION_F:
-				bits +=  1;
-				break;
-			}
-
-		if (bits > 36) {
-			dev_err(adapter->pdev_dev,
-				"tp_vlan_pri_map=%#x needs %d bits > 36;"\
-				" using %#x\n", tp_vlan_pri_map, bits,
-				TP_VLAN_PRI_MAP_DEFAULT);
-			tp_vlan_pri_map = TP_VLAN_PRI_MAP_DEFAULT;
-		}
-	}
-	v = tp_vlan_pri_map;
-	t4_write_indirect(adapter, TP_PIO_ADDR_A, TP_PIO_DATA_A,
-			  &v, 1, TP_VLAN_PRI_MAP_A);
-
-	/*
-	 * We need Five Tuple Lookup mode to be set in TP_GLOBAL_CONFIG order
-	 * to support any of the compressed filter fields above.  Newer
-	 * versions of the firmware do this automatically but it doesn't hurt
-	 * to set it here.  Meanwhile, we do _not_ need to set Lookup Every
-	 * Packet in TP_INGRESS_CONFIG to support matching non-TCP packets
-	 * since the firmware automatically turns this on and off when we have
-	 * a non-zero number of filters active (since it does have a
-	 * performance impact).
-	 */
-	if (tp_vlan_pri_map)
-		t4_set_reg_field(adapter, TP_GLOBAL_CONFIG_A,
-				 FIVETUPLELOOKUP_V(FIVETUPLELOOKUP_M),
-				 FIVETUPLELOOKUP_V(FIVETUPLELOOKUP_M));
-
-	/*
-	 * Tweak some settings.
-	 */
-	t4_write_reg(adapter, TP_SHIFT_CNT_A, SYNSHIFTMAX_V(6) |
-		     RXTSHIFTMAXR1_V(4) | RXTSHIFTMAXR2_V(15) |
-		     PERSHIFTBACKOFFMAX_V(8) | PERSHIFTMAX_V(8) |
-		     KEEPALIVEMAXR1_V(4) | KEEPALIVEMAXR2_V(9));
-
-	/*
-	 * Get basic stuff going by issuing the Firmware Initialize command.
-	 * Note that this _must_ be after all PFVF commands ...
-	 */
-	ret = t4_fw_initialize(adapter, adapter->mbox);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Return successfully!
-	 */
-	dev_info(adapter->pdev_dev, "Successfully configured using built-in "\
-		 "driver parameters\n");
-	return 0;
-
-	/*
-	 * Something bad happened.  Return the error ...
-	 */
-bye:
-	return ret;
-}
-
 static struct fw_info fw_info_array[] = {
 	{
 		.chip = CHELSIO_T4,
@@ -5662,88 +5306,58 @@ static int adap_init0(struct adapter *adap)
 	adap->params.nports = hweight32(port_vec);
 	adap->params.portvec = port_vec;
 
-	/*
-	 * If the firmware is initialized already (and we're not forcing a
-	 * master initialization), note that we're living with existing
-	 * adapter parameters.  Otherwise, it's time to try initializing the
-	 * adapter ...
+	/* If the firmware is initialized already, emit a simply note to that
+	 * effect. Otherwise, it's time to try initializing the adapter.
 	 */
 	if (state == DEV_STATE_INIT) {
 		dev_info(adap->pdev_dev, "Coming up as %s: "\
 			 "Adapter already initialized\n",
 			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
-		adap->flags |= USING_SOFT_PARAMS;
 	} else {
 		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
 			 "Initializing adapter\n");
-		/*
-		 * If the firmware doesn't support Configuration
-		 * Files warn user and exit,
+
+		/* Find out whether we're dealing with a version of the
+		 * firmware which has configuration file support.
 		 */
-		if (ret < 0)
-			dev_warn(adap->pdev_dev, "Firmware doesn't support "
-				 "configuration file.\n");
-		if (force_old_init)
-			ret = adap_init0_no_config(adap, reset);
-		else {
-			/*
-			 * Find out whether we're dealing with a version of
-			 * the firmware which has configuration file support.
-			 */
-			params[0] = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DEV) |
-				     FW_PARAMS_PARAM_X_V(
-					     FW_PARAMS_PARAM_DEV_CF));
-			ret = t4_query_params(adap, adap->mbox, adap->fn, 0, 1,
-					      params, val);
-
-			/*
-			 * If the firmware doesn't support Configuration
-			 * Files, use the old Driver-based, hard-wired
-			 * initialization.  Otherwise, try using the
-			 * Configuration File support and fall back to the
-			 * Driver-based initialization if there's no
-			 * Configuration File found.
-			 */
-			if (ret < 0)
-				ret = adap_init0_no_config(adap, reset);
-			else {
-				/*
-				 * The firmware provides us with a memory
-				 * buffer where we can load a Configuration
-				 * File from the host if we want to override
-				 * the Configuration File in flash.
-				 */
+		params[0] = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DEV) |
+			     FW_PARAMS_PARAM_X_V(FW_PARAMS_PARAM_DEV_CF));
+		ret = t4_query_params(adap, adap->mbox, adap->fn, 0, 1,
+				      params, val);
 
-				ret = adap_init0_config(adap, reset);
-				if (ret == -ENOENT) {
-					dev_info(adap->pdev_dev,
-					    "No Configuration File present "
-					    "on adapter. Using hard-wired "
-					    "configuration parameters.\n");
-					ret = adap_init0_no_config(adap, reset);
-				}
-			}
+		/* If the firmware doesn't support Configuration Files,
+		 * return an error.
+		 */
+		if (ret < 0) {
+			dev_err(adap->pdev_dev, "firmware doesn't support "
+				"Firmware Configuration Files\n");
+			goto bye;
+		}
+
+		/* The firmware provides us with a memory buffer where we can
+		 * load a Configuration File from the host if we want to
+		 * override the Configuration File in flash.
+		 */
+		ret = adap_init0_config(adap, reset);
+		if (ret == -ENOENT) {
+			dev_err(adap->pdev_dev, "no Configuration File "
+				"present on adapter.\n");
+			goto bye;
 		}
 		if (ret < 0) {
-			dev_err(adap->pdev_dev,
-				"could not initialize adapter, error %d\n",
-				-ret);
+			dev_err(adap->pdev_dev, "could not initialize "
+				"adapter, error %d\n", -ret);
 			goto bye;
 		}
 	}
 
-	/*
-	 * If we're living with non-hard-coded parameters (either from a
-	 * Firmware Configuration File or values programmed by a different PF
-	 * Driver), give the SGE code a chance to pull in anything that it
-	 * needs ...  Note that this must be called after we retrieve our VPD
-	 * parameters in order to know how to convert core ticks to seconds.
+	/* Give the SGE code a chance to pull in anything that it needs ...
+	 * Note that this must be called after we retrieve our VPD parameters
+	 * in order to know how to convert core ticks to seconds, etc.
 	 */
-	if (adap->flags & USING_SOFT_PARAMS) {
-		ret = t4_sge_init(adap);
-		if (ret < 0)
-			goto bye;
-	}
+	ret = t4_sge_init(adap);
+	if (ret < 0)
+		goto bye;
 
 	if (is_bypass_device(adap->pdev->device))
 		adap->params.bypass = 1;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index a79fa6a..ca42e2e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2742,24 +2742,11 @@ void t4_sge_stop(struct adapter *adap)
 }
 
 /**
- *	t4_sge_init - initialize SGE
+ *	t4_sge_init_soft - grab core SGE values needed by SGE code
  *	@adap: the adapter
  *
- *	Performs SGE initialization needed every time after a chip reset.
- *	We do not initialize any of the queues here, instead the driver
- *	top-level must request them individually.
- *
- *	Called in two different modes:
- *
- *	 1. Perform actual hardware initialization and record hard-coded
- *	    parameters which were used.  This gets used when we're the
- *	    Master PF and the Firmware Configuration File support didn't
- *	    work for some reason.
- *
- *	 2. We're not the Master PF or initialization was performed with
- *	    a Firmware Configuration File.  In this case we need to grab
- *	    any of the SGE operating parameters that we need to have in
- *	    order to do our job and make sure we can live with them ...
+ *	We need to grab the SGE operating parameters that we need to have
+ *	in order to do our job and make sure we can live with them.
  */
 
 static int t4_sge_init_soft(struct adapter *adap)
@@ -2852,73 +2839,13 @@ static int t4_sge_init_soft(struct adapter *adap)
 	return 0;
 }
 
-static int t4_sge_init_hard(struct adapter *adap)
-{
-	struct sge *s = &adap->sge;
-
-	/*
-	 * Set up our basic SGE mode to deliver CPL messages to our Ingress
-	 * Queue and Packet Date to the Free List.
-	 */
-	t4_set_reg_field(adap, SGE_CONTROL_A, RXPKTCPLMODE_F, RXPKTCPLMODE_F);
-
-	/*
-	 * Set up to drop DOORBELL writes when the DOORBELL FIFO overflows
-	 * and generate an interrupt when this occurs so we can recover.
-	 */
-	if (is_t4(adap->params.chip)) {
-		t4_set_reg_field(adap, SGE_DBFIFO_STATUS_A,
-				 HP_INT_THRESH_V(HP_INT_THRESH_M) |
-				 LP_INT_THRESH_V(LP_INT_THRESH_M),
-				 HP_INT_THRESH_V(dbfifo_int_thresh) |
-				 LP_INT_THRESH_V(dbfifo_int_thresh));
-	} else {
-		t4_set_reg_field(adap, SGE_DBFIFO_STATUS_A,
-				 LP_INT_THRESH_T5_V(LP_INT_THRESH_T5_M),
-				 LP_INT_THRESH_T5_V(dbfifo_int_thresh));
-		t4_set_reg_field(adap, SGE_DBFIFO_STATUS2_A,
-				 HP_INT_THRESH_T5_V(HP_INT_THRESH_T5_M),
-				 HP_INT_THRESH_T5_V(dbfifo_int_thresh));
-	}
-	t4_set_reg_field(adap, SGE_DOORBELL_CONTROL_A, ENABLE_DROP_F,
-			 ENABLE_DROP_F);
-
-	/*
-	 * SGE_FL_BUFFER_SIZE0 (RX_SMALL_PG_BUF) is set up by
-	 * t4_fixup_host_params().
-	 */
-	s->fl_pg_order = FL_PG_ORDER;
-	if (s->fl_pg_order)
-		t4_write_reg(adap,
-			     SGE_FL_BUFFER_SIZE0_A+RX_LARGE_PG_BUF*sizeof(u32),
-			     PAGE_SIZE << FL_PG_ORDER);
-	t4_write_reg(adap, SGE_FL_BUFFER_SIZE0_A+RX_SMALL_MTU_BUF*sizeof(u32),
-		     FL_MTU_SMALL_BUFSIZE(adap));
-	t4_write_reg(adap, SGE_FL_BUFFER_SIZE0_A+RX_LARGE_MTU_BUF*sizeof(u32),
-		     FL_MTU_LARGE_BUFSIZE(adap));
-
-	/*
-	 * Note that the SGE Ingress Packet Count Interrupt Threshold and
-	 * Timer Holdoff values must be supplied by our caller.
-	 */
-	t4_write_reg(adap, SGE_INGRESS_RX_THRESHOLD_A,
-		     THRESHOLD_0_V(s->counter_val[0]) |
-		     THRESHOLD_1_V(s->counter_val[1]) |
-		     THRESHOLD_2_V(s->counter_val[2]) |
-		     THRESHOLD_3_V(s->counter_val[3]));
-	t4_write_reg(adap, SGE_TIMER_VALUE_0_AND_1_A,
-		     TIMERVALUE0_V(us_to_core_ticks(adap, s->timer_val[0])) |
-		     TIMERVALUE1_V(us_to_core_ticks(adap, s->timer_val[1])));
-	t4_write_reg(adap, SGE_TIMER_VALUE_2_AND_3_A,
-		     TIMERVALUE2_V(us_to_core_ticks(adap, s->timer_val[2])) |
-		     TIMERVALUE3_V(us_to_core_ticks(adap, s->timer_val[3])));
-	t4_write_reg(adap, SGE_TIMER_VALUE_4_AND_5_A,
-		     TIMERVALUE4_V(us_to_core_ticks(adap, s->timer_val[4])) |
-		     TIMERVALUE5_V(us_to_core_ticks(adap, s->timer_val[5])));
-
-	return 0;
-}
-
+/**
+ *     t4_sge_init - initialize SGE
+ *     @adap: the adapter
+ *
+ *     Perform low-level SGE code initialization needed every time after a
+ *     chip reset.
+ */
 int t4_sge_init(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
@@ -2959,10 +2886,7 @@ int t4_sge_init(struct adapter *adap)
 		s->fl_align = max(ingpadboundary, ingpackboundary);
 	}
 
-	if (adap->flags & USING_SOFT_PARAMS)
-		ret = t4_sge_init_soft(adap);
-	else
-		ret = t4_sge_init_hard(adap);
+	ret = t4_sge_init_soft(adap);
 	if (ret < 0)
 		return ret;
 
-- 
1.7.1

^ permalink raw reply related

* RE: [PATCH net-next] rhashtable: Lower/upper bucket may map to same lock while shrinking
From: David Laight @ 2015-01-13  9:49 UTC (permalink / raw)
  To: 'Thomas Graf', davem@davemloft.net, Fengguang Wu
  Cc: LKP, linux-kernel@vger.kernel.org,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	netdev@vger.kernel.org
In-Reply-To: <20150112235821.GB16617@casper.infradead.org>

From: Thomas Graf
> Each per bucket lock covers a configurable number of buckets. While
> shrinking, two buckets in the old table contain entries for a single
> bucket in the new table. We need to lock down both while linking.
> Check if they are protected by different locks to avoid a recursive
> lock.

Thought, could the shrunk table use the same locks as the lower half
of the old table?

I also wonder whether shrinking hash tables is ever actually worth
the effort. Most likely they'll need to grow again very quickly.

>  		spin_lock_bh(old_bucket_lock1);
> -		spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> -		spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> +
> +		/* Depending on the lock per buckets mapping, the bucket in
> +		 * the lower and upper region may map to the same lock.
> +		 */
> +		if (old_bucket_lock1 != old_bucket_lock2) {
> +			spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> +		} else {
> +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
> +		}

Acquiring 3 locks of much the same type looks like a locking hierarchy
violation just waiting to happen.

	David

^ permalink raw reply

* Re: [PATCH net-next] rhashtable: unnecessary to use delayed work
From: Ying Xue @ 2015-01-13  9:48 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev
In-Reply-To: <20150113093550.GG20387@casper.infradead.org>

On 01/13/2015 05:35 PM, Thomas Graf wrote:
> On 01/13/15 at 05:00pm, Ying Xue wrote:
>> When we put our declared work task in the global workqueue with
>> schedule_delayed_work(), its delay parameter is always zero.
>> Therefore, we should define a normal work in rhashtable structure
>> instead of a delayed work.
>>
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Cc: Thomas Graf <tgraf@suug.ch>
> 
>> @@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht)
>>  
>>  	mutex_lock(&ht->mutex);
>>  
>> -	cancel_delayed_work(&ht->run_work);
>> +	cancel_work_sync(&ht->run_work);
>>  	bucket_table_free(rht_dereference(ht->tbl, ht));
>>  
>>  	mutex_unlock(&ht->mutex);
> 
> I like the patch!
> 
> I think it introduces a possible dead lock though (see below). OTOH, it
> could actually explain the reason for the 0day lock debug splash that
> was reported.
> 
> Dead lock: The worker could already have been kicked off but was
> interrupted before it acquired ht->mutex. rhashtable_destroy() is
> called and acquired ht->mutex. cancel_work_sync() waits for worker to
> finish while holding ht->mutex. Worker can't finish because it needs to
> acquire ht->mutex to do so.
> 
> For the very same reason the reported warning could have been triggered.
> Instead of the dead lock, it would have called bucket_table_free()
> with a deferred resizer still underway.
> 
> What about we do something like this?
> 
> void rhashtable_destroy(struct rhashtable *ht)
> {
>         ht->being_destroyed = true;
> 	cancel_work_sync(&ht->run_work);
> 
> 	mutex_lock(&ht->mutex);
> 	bucket_table_free(rht_dereference(ht->tbl, ht));
> 	mutex_unlock(&ht->mutex);
> }
> 

Damn! I knew your above described deadlock scenario. Thank you for the
nice catch!

> If you agree we can explain this shortly in the commit message and add:
> Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")
> 

OK, I will deliver the next version.

By the way, I think we should check the following condition before call
cancel_work_sync(), otherwise, we may cancel an uninitialized work.

(ht->p.grow_decision || ht->p.shrink_decision)

What do you think?

Regards,
Ying

> 

^ permalink raw reply

* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Hannes Frederic Sowa @ 2015-01-13 10:11 UTC (permalink / raw)
  To: Rahul Sharma; +Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel
In-Reply-To: <CAFB3abxcg4gdEh4CJHd_Vx8mZKFmO3kMG=pDjkQYS5awTzFbSQ@mail.gmail.com>

On Di, 2015-01-13 at 09:53 +0530, Rahul Sharma wrote:
> On Mon, Jan 12, 2015 at 5:21 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Mon, Jan 12, 2015 at 04:38:16PM +0530, Rahul Sharma wrote:
> >> Hi Pablo, Hannes
> >>
> >> On Fri, Jan 9, 2015 at 9:20 PM, Hannes Frederic Sowa
> >> <hannes@stressinduktion.org> wrote:
> >> > On Fr, 2015-01-09 at 12:45 +0100, Pablo Neira Ayuso wrote:
> >> >> Hi Hannes,
> >> >>
> >> >> On Fri, Jan 09, 2015 at 12:34:15PM +0100, Hannes Frederic Sowa wrote:
> >> >> > On Fri, Jan 9, 2015, at 08:18, Rahul Sharma wrote:
> >> >> > > Hi Pablo,
> >> >> > >
> >> >> > > On Fri, Jan 9, 2015 at 5:35 AM, Pablo Neira Ayuso <pablo@netfilter.org>
> >> >> > > wrote:
> >> >> > > > On Thu, Jan 08, 2015 at 11:39:16PM +0100, Hannes Frederic Sowa wrote:
> >> >> > > >> Hi Pablo,
> >> >> > > >>
> >> >> > > >> On Thu, Jan 8, 2015, at 21:53, Pablo Neira Ayuso wrote:
> >> >> > > >> > I'm afraid we cannot just get rid of that !ipv6_ext_hdr() check. The
> >> >> > > >> > ipv6_find_hdr() function is designed to return the transport protocol.
> >> >> > > >> > After the proposed change, it will return extension header numbers.
> >> >> > > >> > This will break existing ip6tables rulesets since the `-p' option
> >> >> > > >> > relies on this function to match the transport protocol.
> >> >> > > >> >
> >> >> > > >> > Note that the AH header is skipped (see code a bit below this
> >> >> > > >> > problematic fragmentation handling) so the follow up header after the
> >> >> > > >> > AH header is returned as the transport header.
> >> >> > > >> >
> >> >> > > >> > We can probably return the AH protocol number for non-1st fragments.
> >> >> > > >> > However, that would be something new to ip6tables since nobody has
> >> >> > > >> > ever seen packet matching `-p ah' rules. Thus, we restore control to
> >> >> > > >> > the user to allow this, but we would accept all kind of fragmented AH
> >> >> > > >> > traffic through the firewall since we cannot know what transport
> >> >> > > >> > protocol contains from non-1st fragments (unless I'm missing anything,
> >> >> > > >> > I need to have a closer look at this again tomorrow with fresher
> >> >> > > >> > mind).
> >> >> > > >>
> >> >> > > >> The code in question is guarded by (_frag_off != 0), so we are
> >> >> > > >> definitely processing a non-1st fragment currently. The -p match would
> >> >> > > >> happen at the time when the packet is reassembled and thus ipv6_find_hdr
> >> >> > > >> will find the real transport (final) header at this point (I hope I
> >> >> > > >> followed the code correctly here).
> >> >> > > >
> >> >> > > > Then, Rahul should get things working by modprobing nf_defrag_ipv6.
> >> >> > >
> >> >> > > I already had nf_defrag_ipv6 installed when the issue occured. But I
> >> >> > > see ip6table_raw_hook returning NF_DROP for the second fragment.
> >> >> >
> >> >> > That's what I expected. I think the change only affects hooks before
> >> >> > reassembly.
> >> >>
> >> >> reassembly happens at NF_IP6_PRI_CONNTRACK_DEFRAG (-400), so that
> >> >> happens before NF_IP6_PRI_RAW (-300) in IPv6 which is where the raw
> >> >> table is placed.
> >> >
> >> > I tried to reproduce it, but couldn't get non-1st fragments getting
> >> > dropped during traversal of the raw table. They get dropped earlier at
> >> > during reassembly or pass.
> >> >
> >> > I agree with Pablo, I also would like to see more data.
> >> >
> >> > Thanks,
> >> > Hannes
> >> >
> >> >
> >>
> >> I enabled pr_debug() and there was no error in nf_ct_frag6_gather().
> >> It seems to have defragmented the packet correctly. As expected,
> >> ipv6_defrag() returns NF_STOLEN for the first packet after queuing it.
> >> For the next fragment, ipv6_defrag() calls nf_ct_frag6_output() after
> >> after reassembling it.
> >
> > nf_ct_frag6_output() doesn't exist anymore. You're using an old
> > kernel, you should have started by telling so in your report.
> >
> > See 6aafeef ("netfilter: push reasm skb through instead of original
> > frag skbs").
> 
>  I apologize for not mentioning the kernel version in my first mail. I
> had suspected problem in ipv6_find_hdr, the code for which was same.
> Anyway, thanks for the help. I ll try to figure out how to make this
> work in my kernel.

If you have time could you quickly test a recent net-next kernel?

Thanks,
Hannes

^ permalink raw reply

* Re: why are IPv6 addresses removed on link down
From: Hannes Frederic Sowa @ 2015-01-13 10:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Ahern, netdev@vger.kernel.org
In-Reply-To: <20150112231021.316648e3@urahara>

On Mo, 2015-01-12 at 23:10 -0800, Stephen Hemminger wrote:
> On Mon, 12 Jan 2015 22:06:44 -0700
> David Ahern <dsahern@gmail.com> wrote:
> 
> > We noticed that IPv6 addresses are removed on a link down. e.g.,
> >    ip link set dev eth1
> > 
> > 
> > Looking at the code it appears to be this code path in addrconf.c:
> > 
> >          case NETDEV_DOWN:
> >          case NETDEV_UNREGISTER:
> >                  /*
> >                   *      Remove all addresses from this interface.
> >                   */
> >                  addrconf_ifdown(dev, event != NETDEV_DOWN);
> >                  break;
> > 
> > IPv4 addresses are NOT removed on a link down. Is there a particular 
> > reason IPv6 addresses are?
> > 
> > Thanks,
> > David
> 
> See RFC's which describes how IPv6 does Duplicate Address Detection.
> Address is not valid when link is down, since DAD is not possible.

It should be no problem if the kernel would reacquire them on ifup and
do proper DAD. We simply must not use them while the interface is dead
(also making sure they don't get used for loopback routing).

The problem the IPv6 addresses get removed is much more a historical
artifact nowadays, I think. It is part of user space API and scripts
deal with that already.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH v5] can: Convert to runtime_pm
From: Marc Kleine-Budde @ 2015-01-13 11:08 UTC (permalink / raw)
  To: Sören Brinkmann, Kedareswara rao Appana
  Cc: wg, michal.simek, grant.likely, robh+dt, linux-can, netdev,
	linux-arm-kernel, linux-kernel, devicetree,
	Kedareswara rao Appana
In-Reply-To: <3a3437c5c8ff48d9a45fee7e81fa8dca@BY2FFO11FD058.protection.gbl>

[-- Attachment #1: Type: text/plain, Size: 3138 bytes --]

On 01/12/2015 07:45 PM, Sören Brinkmann wrote:
> On Mon, 2015-01-12 at 08:34PM +0530, Kedareswara rao Appana wrote:
>> Instead of enabling/disabling clocks at several locations in the driver,
>> Use the runtime_pm framework. This consolidates the actions for runtime PM
>> In the appropriate callbacks and makes the driver more readable and mantainable.
>>
>> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
>> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
>> ---
>> Changes for v5:
>>  - Updated with the review comments.
>>    Updated the remove fuction to use runtime_pm.
>> Chnages for v4:
>>  - Updated with the review comments.
>> Changes for v3:
>>   - Converted the driver to use runtime_pm.
>> Changes for v2:
>>   - Removed the struct platform_device* from suspend/resume
>>     as suggest by Lothar.
>>
>>  drivers/net/can/xilinx_can.c |  157 ++++++++++++++++++++++++++++-------------
>>  1 files changed, 107 insertions(+), 50 deletions(-)
> [..]
>> +static int __maybe_unused xcan_runtime_resume(struct device *dev)
>>  {
>> -	struct platform_device *pdev = dev_get_drvdata(dev);
>> -	struct net_device *ndev = platform_get_drvdata(pdev);
>> +	struct net_device *ndev = dev_get_drvdata(dev);
>>  	struct xcan_priv *priv = netdev_priv(ndev);
>>  	int ret;
>> +	u32 isr, status;
>>  
>>  	ret = clk_enable(priv->bus_clk);
>>  	if (ret) {
>> @@ -1014,15 +1030,28 @@ static int __maybe_unused xcan_resume(struct device *dev)
>>  	ret = clk_enable(priv->can_clk);
>>  	if (ret) {
>>  		dev_err(dev, "Cannot enable clock.\n");
>> -		clk_disable_unprepare(priv->bus_clk);
>> +		clk_disable(priv->bus_clk);
> [...]
>> @@ -1173,12 +1219,23 @@ static int xcan_remove(struct platform_device *pdev)
>>  {
>>  	struct net_device *ndev = platform_get_drvdata(pdev);
>>  	struct xcan_priv *priv = netdev_priv(ndev);
>> +	int ret;
>> +
>> +	ret = pm_runtime_get_sync(&pdev->dev);
>> +	if (ret < 0) {
>> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
>> +				__func__, ret);
>> +		return ret;
>> +	}
>>  
>>  	if (set_reset_mode(ndev) < 0)
>>  		netdev_err(ndev, "mode resetting failed!\n");
>>  
>>  	unregister_candev(ndev);
>> +	pm_runtime_disable(&pdev->dev);
>>  	netif_napi_del(&priv->napi);
>> +	clk_disable_unprepare(priv->bus_clk);
>> +	clk_disable_unprepare(priv->can_clk);
> 
> Shouldn't pretty much all these occurrences of clk_disable/enable
> disappear? This should all be handled by the runtime_pm framework now.

We have:
- clk_prepare_enable() in probe
- clk_disable_unprepare() in remove
- clk_enable() in runtime_resume
- clk_disable() in runtime_suspend

Which is, as far as I understand the right way to do it. Maybe
Kedareswara can post the clock debug output again with this patch
iteration. Have I missed something?

regards,
Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [RFC 1/2] misc: uidstat: Add uid stat driver to collect network statistics.
From: Kiran Raparthy @ 2015-01-13 11:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mike Chan, Arnd Bergmann, Greg Kroah-Hartman, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, Android Kernel Team, John Stultz,
	Sumit Semwal, JP Abgrall, Arve Hj�nnev�g,
	Kiran Raparthy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 9395 bytes --]

From: Mike Chan <mike@android.com>

misc: uidstat: Add uid stat driver to collect network statistics.

To analyze application's network statistics, we need a mechanism to export
the UID based statistics to userspace so that userspace tools can use the
exported numbers and generate the report against the UID.

This patch allows the user to explore the UID based network statistics
exported to /proc/uid_stat.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
cc: Mike Chan <mike@android.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Android Kernel Team <kernel-team@android.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: JP Abgrall <jpa@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Mike Chan <mike@android.com>
[Kiran: Added context to commit message.
included build fixes from JP Abgrall and Arve.
Used single_release instead of seq_release.
internally continued to use uid_t but used covnersion
from kuid_t where ever necessary]
Signed-off-by: Kiran Raparthy <kiran.kumar@linaro.org>
---
 drivers/misc/Kconfig     |   7 +++
 drivers/misc/Makefile    |   1 +
 drivers/misc/uid_stat.c  | 157 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/uid_stat.h |  33 ++++++++++
 net/ipv4/tcp.c           |  10 +++
 5 files changed, 208 insertions(+)
 create mode 100644 drivers/misc/uid_stat.c
 create mode 100644 include/linux/uid_stat.h

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 006242c..1c79b94 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -402,6 +402,13 @@ config TI_DAC7512
 	  This driver can also be built as a module. If so, the module
 	  will be called ti_dac7512.
 
+config UID_STAT
+	bool "UID based statistics tracking"
+	default n
+	help
+	  This option allows to enable UID based network statistics tracking.
+	  statistics are exported to /proc/uid_stat.
+
 config VMWARE_BALLOON
 	tristate "VMware Balloon Driver"
 	depends on X86 && HYPERVISOR_GUEST
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 7d5c4cd..3754347 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_ISL29020)		+= isl29020.o
 obj-$(CONFIG_SENSORS_TSL2550)	+= tsl2550.o
 obj-$(CONFIG_DS1682)		+= ds1682.o
 obj-$(CONFIG_TI_DAC7512)	+= ti_dac7512.o
+obj-$(CONFIG_UID_STAT)		+= uid_stat.o
 obj-$(CONFIG_C2PORT)		+= c2port/
 obj-$(CONFIG_HMC6352)		+= hmc6352.o
 obj-y				+= eeprom/
diff --git a/drivers/misc/uid_stat.c b/drivers/misc/uid_stat.c
new file mode 100644
index 0000000..aaaa406
--- /dev/null
+++ b/drivers/misc/uid_stat.c
@@ -0,0 +1,157 @@
+/* drivers/misc/uid_stat.c
+ *
+ * Copyright (C) 2008 - 2009 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/stat.h>
+#include <linux/uid_stat.h>
+
+static DEFINE_SPINLOCK(uid_lock);
+static LIST_HEAD(uid_list);
+static struct proc_dir_entry *parent;
+
+struct uid_stat {
+	struct list_head link;
+	uid_t uid;
+	atomic_t tcp_rcv;
+	atomic_t tcp_snd;
+};
+
+static struct uid_stat *find_uid_stat(uid_t uid)
+{
+	struct uid_stat *entry;
+
+	list_for_each_entry(entry, &uid_list, link) {
+		if (entry->uid == uid)
+			return entry;
+	}
+	return NULL;
+}
+
+static int uid_stat_atomic_int_show(struct seq_file *m, void *v)
+{
+	unsigned int bytes;
+	atomic_t *counter = m->private;
+
+	bytes = (unsigned int) (atomic_read(counter) + INT_MIN);
+	return seq_printf(m, "%u\n", bytes);
+}
+
+static int uid_stat_read_atomic_int_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, uid_stat_atomic_int_show, PDE_DATA(inode));
+}
+
+static const struct file_operations uid_stat_read_atomic_int_fops = {
+	.open		= uid_stat_read_atomic_int_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+/* Create a new entry for tracking the specified uid. */
+static struct uid_stat *create_stat(uid_t uid)
+{
+	struct uid_stat *new_uid;
+	/* Create the uid stat struct and append it to the list. */
+	new_uid = kmalloc(sizeof(struct uid_stat), GFP_ATOMIC);
+	if (!new_uid)
+		return NULL;
+
+	new_uid->uid = uid;
+	/* Counters start at INT_MIN, so we can track 4GB of network traffic. */
+	atomic_set(&new_uid->tcp_rcv, INT_MIN);
+	atomic_set(&new_uid->tcp_snd, INT_MIN);
+
+	list_add_tail(&new_uid->link, &uid_list);
+	return new_uid;
+}
+
+static void create_stat_proc(struct uid_stat *new_uid)
+{
+	char uid_s[32];
+	struct proc_dir_entry *entry;
+
+	sprintf(uid_s, "%d", new_uid->uid);
+	entry = proc_mkdir(uid_s, parent);
+
+	/* Keep reference to uid_stat so we know what uid to read stats from. */
+	proc_create_data("tcp_snd", S_IRUGO, entry,
+			 &uid_stat_read_atomic_int_fops, &new_uid->tcp_snd);
+
+	proc_create_data("tcp_rcv", S_IRUGO, entry,
+			 &uid_stat_read_atomic_int_fops, &new_uid->tcp_rcv);
+}
+
+static struct uid_stat *find_or_create_uid_stat(kuid_t uid)
+{
+	struct uid_stat *entry;
+	unsigned long flags;
+	uid_t proc_uid;
+
+	proc_uid = from_kuid(&init_user_ns, uid);
+	spin_lock_irqsave(&uid_lock, flags);
+	entry = find_uid_stat(proc_uid);
+	if (entry) {
+		spin_unlock_irqrestore(&uid_lock, flags);
+		return entry;
+	}
+	entry = create_stat(proc_uid);
+	spin_unlock_irqrestore(&uid_lock, flags);
+	if (entry)
+		create_stat_proc(entry);
+	return entry;
+}
+
+int uid_stat_tcp_snd(kuid_t uid, int size)
+{
+	struct uid_stat *entry;
+
+	entry = find_or_create_uid_stat(uid);
+	if (!entry)
+		return -1;
+	atomic_add(size, &entry->tcp_snd);
+	return 0;
+}
+
+int uid_stat_tcp_rcv(kuid_t uid, int size)
+{
+	struct uid_stat *entry;
+
+	entry = find_or_create_uid_stat(uid);
+	if (!entry)
+		return -1;
+	atomic_add(size, &entry->tcp_rcv);
+	return 0;
+}
+
+static int __init uid_stat_init(void)
+{
+	parent = proc_mkdir("uid_stat", NULL);
+	if (!parent) {
+		pr_err("uid_stat: failed to create proc entry\n");
+		return -1;
+	}
+	return 0;
+}
+
+device_initcall(uid_stat_init);
diff --git a/include/linux/uid_stat.h b/include/linux/uid_stat.h
new file mode 100644
index 0000000..68823d3
--- /dev/null
+++ b/include/linux/uid_stat.h
@@ -0,0 +1,33 @@
+/* include/linux/uid_stat.h
+ *
+ * Copyright (C) 2008-2009 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef __uid_stat_h
+#define __uid_stat_h
+
+/* Contains definitions for resource tracking per uid. */
+
+#ifdef CONFIG_UID_STAT
+int uid_stat_tcp_snd(kuid_t uid, int size);
+int uid_stat_tcp_rcv(kuid_t uid, int size);
+#else
+static inline int uid_stat_tcp_snd(kuid_t uid, int size)
+{
+}
+static inline int uid_stat_tcp_rcv(kuid_t uid, int size)
+{
+}
+#endif
+
+#endif /* _LINUX_UID_STAT_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3075723..00eb156 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -268,6 +268,7 @@
 #include <linux/crypto.h>
 #include <linux/time.h>
 #include <linux/slab.h>
+#include <linux/uid_stat.h>
 
 #include <net/icmp.h>
 #include <net/inet_common.h>
@@ -1280,6 +1281,9 @@ out:
 		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
 out_nopush:
 	release_sock(sk);
+
+	if (copied + copied_syn)
+		uid_stat_tcp_snd(current_uid(), copied + copied_syn);
 	return copied + copied_syn;
 
 do_fault:
@@ -1551,6 +1555,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 	if (copied > 0) {
 		tcp_recv_skb(sk, seq, &offset);
 		tcp_cleanup_rbuf(sk, copied);
+		uid_stat_tcp_rcv(current_uid(), copied);
 	}
 	return copied;
 }
@@ -1879,6 +1884,9 @@ skip_copy:
 	tcp_cleanup_rbuf(sk, copied);
 
 	release_sock(sk);
+
+	if (copied > 0)
+		uid_stat_tcp_rcv(current_uid(), copied);
 	return copied;
 
 out:
@@ -1887,6 +1895,8 @@ out:
 
 recv_urg:
 	err = tcp_recv_urg(sk, msg, len, flags);
+	if (err > 0)
+		uid_stat_tcp_rcv(current_uid(), err);
 	goto out;
 
 recv_sndq:
-- 
1.8.2.1

^ permalink raw reply related

* [RFC 2/2] net: activity_stats: Add statistics for network transmission activity
From: Kiran Raparthy @ 2015-01-13 11:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mike Chan, Arnd Bergmann, Greg Kroah-Hartman, David S. Miller,
	netdev, Android Kernel Team, John Stultz, Sumit Semwal,
	Arve Hj�nnev�g, Kiran Raparthy
In-Reply-To: <1421147642-28360-1-git-send-email-kiran.kumar@linaro.org>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 7811 bytes --]

From: Mike Chan <mike@android.com>

net: activity_stats: Add statistics for network transmission activity

When enabled, tracks the frequency of network transmissions
(inbound and outbound) and buckets them accordingly.
Buckets are determined by time between network activity.

Each bucket represents the number of network transmisions that were
N sec or longer apart. Where N is defined as 1 << bucket index.

This network pattern tracking is particularly useful for wireless
networks (ie: 3G) where batching network activity closely together
is more power efficient than far apart.

New file: /proc/net/stat/activity

output:

Min Bucket(sec) Count
              1 7
              2 0
              4 1
              8 0
             16 0
             32 2
             64 1
            128 0

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Android Kernel Team <kernel-team@android.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Mike Chan <mike@android.com>
[Kiran: Added context to commit message.
included build fix from Arve,continued to use uid_t
but used covnersion from kuid_t where ever necessary]
Signed-off-by: Kiran Raparthy <kiran.kumar@linaro.org>
---
 drivers/misc/uid_stat.c      |   3 ++
 include/net/activity_stats.h |  25 ++++++++++
 net/Kconfig                  |   8 +++
 net/Makefile                 |   1 +
 net/activity_stats.c         | 116 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 153 insertions(+)
 create mode 100644 include/net/activity_stats.h
 create mode 100644 net/activity_stats.c

diff --git a/drivers/misc/uid_stat.c b/drivers/misc/uid_stat.c
index aaaa406..63065e2 100644
--- a/drivers/misc/uid_stat.c
+++ b/drivers/misc/uid_stat.c
@@ -24,6 +24,7 @@
 #include <linux/spinlock.h>
 #include <linux/stat.h>
 #include <linux/uid_stat.h>
+#include <net/activity_stats.h>
 
 static DEFINE_SPINLOCK(uid_lock);
 static LIST_HEAD(uid_list);
@@ -126,6 +127,7 @@ int uid_stat_tcp_snd(kuid_t uid, int size)
 {
 	struct uid_stat *entry;
 
+	activity_stats_update();
 	entry = find_or_create_uid_stat(uid);
 	if (!entry)
 		return -1;
@@ -137,6 +139,7 @@ int uid_stat_tcp_rcv(kuid_t uid, int size)
 {
 	struct uid_stat *entry;
 
+	activity_stats_update();
 	entry = find_or_create_uid_stat(uid);
 	if (!entry)
 		return -1;
diff --git a/include/net/activity_stats.h b/include/net/activity_stats.h
new file mode 100644
index 0000000..10e4c15
--- /dev/null
+++ b/include/net/activity_stats.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Author: Mike Chan (mike@android.com)
+ */
+
+#ifndef __activity_stats_h
+#define __activity_stats_h
+
+#ifdef CONFIG_NET_ACTIVITY_STATS
+void activity_stats_update(void);
+#else
+#define activity_stats_update(void) {}
+#endif
+
+#endif /* _NET_ACTIVITY_STATS_H */
diff --git a/net/Kconfig b/net/Kconfig
index ff9ffc1..3159361 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -83,6 +83,14 @@ source "net/netlabel/Kconfig"
 
 endif # if INET
 
+config NET_ACTIVITY_STATS
+	bool "Network activity statistics tracking"
+	default y
+	help
+	 Network activity statistics are useful for tracking wireless
+	 modem activity on 2G, 3G, 4G wireless networks. Counts number of
+	 transmissions and groups them in specified time buckets.
+
 config NETWORK_SECMARK
 	bool "Security Marking"
 	help
diff --git a/net/Makefile b/net/Makefile
index 95fc694..d81a969 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -73,6 +73,7 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
 obj-$(CONFIG_HSR)		+= hsr/
+obj-$(CONFIG_NET_ACTIVITY_STATS)		+= activity_stats.o
 ifneq ($(CONFIG_NET_SWITCHDEV),)
 obj-y				+= switchdev/
 endif
diff --git a/net/activity_stats.c b/net/activity_stats.c
new file mode 100644
index 0000000..cf0a09d
--- /dev/null
+++ b/net/activity_stats.c
@@ -0,0 +1,116 @@
+/* net/activity_stats.c
+ *
+ * Copyright (C) 2010 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Author: Mike Chan (mike@android.com)
+ */
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/suspend.h>
+#include <net/net_namespace.h>
+
+/* Track transmission rates in buckets (power of 2).
+ * 1,2,4,8...512 seconds.
+ *
+ * Buckets represent the count of network transmissions at least
+ * N seconds apart, where N is 1 << bucket index.
+ */
+#define BUCKET_MAX 10
+
+/* Track network activity frequency */
+static unsigned long activity_stats[BUCKET_MAX];
+static ktime_t last_transmit;
+static ktime_t suspend_time;
+static DEFINE_SPINLOCK(activity_lock);
+
+void activity_stats_update(void)
+{
+	int i;
+	unsigned long flags;
+	ktime_t now;
+	s64 delta;
+
+	spin_lock_irqsave(&activity_lock, flags);
+	now = ktime_get();
+	delta = ktime_to_ns(ktime_sub(now, last_transmit));
+
+	for (i = BUCKET_MAX - 1; i >= 0; i--) {
+		/* Check if the time delta between network activity is within
+		 * the minimum bucket range.
+		 */
+		if (delta < (1000000000ULL << i))
+			continue;
+
+		activity_stats[i]++;
+		last_transmit = now;
+		break;
+	}
+	spin_unlock_irqrestore(&activity_lock, flags);
+}
+
+static int activity_stats_show(struct seq_file *m, void *v)
+{
+	int i;
+	int ret;
+
+	seq_printf(m, "Min Bucket(sec) Count\n");
+
+	for (i = 0; i < BUCKET_MAX; i++) {
+		ret = seq_printf(m, "%15d %lu\n", 1 << i, activity_stats[i]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int activity_stats_notifier(struct notifier_block *nb,
+				   unsigned long event, void *dummy)
+{
+	switch (event) {
+	case PM_SUSPEND_PREPARE:
+		suspend_time = ktime_get_real();
+		break;
+
+	case PM_POST_SUSPEND:
+		suspend_time = ktime_sub(ktime_get_real(), suspend_time);
+		last_transmit = ktime_sub(last_transmit, suspend_time);
+	}
+
+	return 0;
+}
+
+static int activity_stats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, activity_stats_show, PDE_DATA(inode));
+}
+
+static const struct file_operations activity_stats_fops = {
+	.open		= activity_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
+static struct notifier_block activity_stats_notifier_block = {
+	.notifier_call = activity_stats_notifier,
+};
+
+static int  __init activity_stats_init(void)
+{
+	proc_create("activity", S_IRUGO,
+		    init_net.proc_net_stat, &activity_stats_fops);
+	return register_pm_notifier(&activity_stats_notifier_block);
+}
+
+subsys_initcall(activity_stats_init);
-- 
1.8.2.1

^ permalink raw reply related

* Re: [PATCH net-next] rhashtable: Lower/upper bucket may map to same lock while shrinking
From: 'Thomas Graf' @ 2015-01-13 11:25 UTC (permalink / raw)
  To: David Laight
  Cc: davem@davemloft.net, Fengguang Wu, LKP,
	linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org,
	coreteam@netfilter.org, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1CAC6593@AcuExch.aculab.com>

On 01/13/15 at 09:49am, David Laight wrote:
> From: Thomas Graf
> > Each per bucket lock covers a configurable number of buckets. While
> > shrinking, two buckets in the old table contain entries for a single
> > bucket in the new table. We need to lock down both while linking.
> > Check if they are protected by different locks to avoid a recursive
> > lock.
> 
> Thought, could the shrunk table use the same locks as the lower half
> of the old table?

No. A new bucket table and thus a new set of locks is allocated when the
table is shrunk or grown. We only have check for overlapping locks
when holding multiple locks for the same table at the same time.

> I also wonder whether shrinking hash tables is ever actually worth
> the effort. Most likely they'll need to grow again very quickly.

Specifying a .shrink_decision function is optional so every rhashtable
user can decide whether it wants shrinking or not. Need for it was
expressed in the past threads.

Also, the case of multiple buckets mapping to the same lock is also
present in the expanding logic so removing the shrinking logic would
not remove the need for these types of checks.

> >  		spin_lock_bh(old_bucket_lock1);
> > -		spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> > -		spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> > +
> > +		/* Depending on the lock per buckets mapping, the bucket in
> > +		 * the lower and upper region may map to the same lock.
> > +		 */
> > +		if (old_bucket_lock1 != old_bucket_lock2) {
> > +			spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> > +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> > +		} else {
> > +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
> > +		}
> 
> Acquiring 3 locks of much the same type looks like a locking hierarchy
> violation just waiting to happen.

I'm not claiming it's extremely pretty, lockless lookup with deferred
resizing doesn't come for free ;-) If you have a suggestion on how to
implement this differently I'm all ears. That said, it's well isolated
and the user of rhashtable does not have to deal with it. All code paths
which take multiple locks are mutually exclusive to each other (ht->mutex).

^ permalink raw reply

* Re: [PATCH net-next] rhashtable: unnecessary to use delayed work
From: Thomas Graf @ 2015-01-13 11:26 UTC (permalink / raw)
  To: Ying Xue; +Cc: davem, netdev
In-Reply-To: <54B4EA06.8010507@windriver.com>

On 01/13/15 at 05:48pm, Ying Xue wrote:
> On 01/13/2015 05:35 PM, Thomas Graf wrote:
> > If you agree we can explain this shortly in the commit message and add:
> > Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")
> 
> OK, I will deliver the next version.
> 
> By the way, I think we should check the following condition before call
> cancel_work_sync(), otherwise, we may cancel an uninitialized work.
> 
> (ht->p.grow_decision || ht->p.shrink_decision)
> 
> What do you think?

+1

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Thomas Graf @ 2015-01-13 11:32 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Alexei Starovoitov, Linux Netdev List, dev@openvswitch.org
In-Reply-To: <CA+mtBx8-rMYSkj2KKXBq60KeF=V=iCG2K0HgVxO5JMVK1yhwCg@mail.gmail.com>

On 01/12/15 at 06:28pm, Tom Herbert wrote:
> On Mon, Jan 12, 2015 at 5:03 PM, Thomas Graf <tgraf@suug.ch> wrote:
> >>
> >> Creating a level of indirection for extensions seems overly
> >> complicated to me. Why not just define IFLA_VXLAN_GBP as just another
> >> enum above?
> >
> > I think it's cleaner to group them in a nested attribute.
> > It clearly separates the optional extensions from the base
> > attributes. RCO, GPE, GBP can all live in there.
> 
> This is inconsistent with similar things in GRE and GUE. For instance,
> GRE keyid is set as its own attribute. It just seems like this adding
> more code to the driver than is necessary for the functionality
> needed.

The major difference here is that we have to consider backwards
compatibility specifically for VXLAN. Your initial feedback on GPE
actually led me to how I implemented GBP.

I think the axioms we want to establish are as follows:
 1. Extensions need to be explicitly enabled by the user. A previously
    dropped frame should only be processed if the user explitly asks
    for it.
 2. As a consequence: only share a VLXAN UDP port if the enabled
    extensions match (vxlan_sock_add), e.g. user A might want RCO
    but user B might be unaware. They cannot share the same UDP port.

The 2nd lead me to introduce the 'exts' member to vxlan_sock so we can
compare it in vxlan_find_sock() and only share a UDP port if the
enabled extensions match.

Your patch currently implements (1) but not (2).

^ permalink raw reply

* [net-next 00/15][pull request] Intel Wired LAN Driver Updates 2015-01-13
From: Jeff Kirsher @ 2015-01-13 11:33 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to i40e and i40evf.

Mitch provides a fix for i40e to move the call to pci_disable_sriov() so
that it is called earlier to ensure that the PF driver won't free VF
resources before the VF remove routine can complete.  Also cleans up
redundant and duplicate code in the i40evf.  Refactors the i40evf
shutdown code and let the watchdog take care of shutting things down.
Fix a possible memory leak, if we are using VLANs and the communication
with the PF fail during shutdown.  On some versions of the firmware, the
VF admin send queue may become stalled.  In this case, the easiest
solution is to place another descriptor on the queue and the firmware
will then process both requests.

Greg adds a warning when the NPAR enabled partitions detected a link speed
less than 10 Gpbs.

Vasu removes redundant VN2VN MAC address which were already added by
the FCoE stack.

Shannon adds code to find how many partitions there are per port and
what is the current partition_id when in NPAR mode.  In multifunction
mode, make sure we only allow SR/IOV on the master PF of a port and
only allow partition 1 to set WoL, speed and flow control.

Kamil adds code to read the PBA block from shadow RAM and returns
the part number in a string format.

Catherine provides a fix to check if link state and link speed has
changed before exiting link event

The following are changes since commit d2c60b1350c9a3eb7ed407c18f50306762365646:
  Merge branch 'tuntap_queues'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Catherine Sullivan (1):
  i40e: Don't exit link event early if link speed has changed

Greg Rose (1):
  i40e: Add warning for NPAR partitions with link speed less than 10Gbps

Kamil Krawczyk (1):
  i40e: Adding function for reading PBA String

Mitch A Williams (8):
  i40e: disable IOV before freeing resources
  i40evf: remove redundant code
  i40evf: Remove some scary log messages
  i40evf: refactor shutdown code
  i40evf: remove leftover VLAN filters
  i40evf: don't fire traffic IRQs when the interface is down
  i40evf: enable interrupt 0 appropriately
  i40evf: kick a stalled admin queue

Shannon Nelson (3):
  i40e/i40evf: find partition_id in npar mode
  i40e: limit WoL and link settings to partition 1
  i40e: limit sriov to partition 1 of NPAR configurations

Vasu Dev (1):
  i40e: remove VN2VN related mac filters

 drivers/net/ethernet/intel/i40e/i40e_common.c      | 125 +++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |  43 ++++++-
 drivers/net/ethernet/intel/i40e/i40e_fcoe.c        |   2 -
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  15 ++-
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   5 +
 drivers/net/ethernet/intel/i40e/i40e_type.h        |   9 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  11 +-
 drivers/net/ethernet/intel/i40evf/i40e_type.h      |   7 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  94 ++++++++--------
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    |   6 +-
 10 files changed, 256 insertions(+), 61 deletions(-)

-- 
1.9.3

^ permalink raw reply

* [net-next 02/15] i40evf: remove redundant code
From: Jeff Kirsher @ 2015-01-13 11:33 UTC (permalink / raw)
  To: davem; +Cc: Mitch A Williams, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1421148811-9763-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch A Williams <mitch.a.williams@intel.com>

These functions are redundant and duplicate functionality found in
i40evf_free_all_[tx|rx]_resources.

Change-ID: Ia199908926d7a1a4b8247f75f89b5da24c9b149c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 27 -------------------------
 1 file changed, 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index cabaf59..ee0db59 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -947,30 +947,6 @@ static int i40evf_up_complete(struct i40evf_adapter *adapter)
 }
 
 /**
- * i40evf_clean_all_rx_rings - Free Rx Buffers for all queues
- * @adapter: board private structure
- **/
-static void i40evf_clean_all_rx_rings(struct i40evf_adapter *adapter)
-{
-	int i;
-
-	for (i = 0; i < adapter->num_active_queues; i++)
-		i40evf_clean_rx_ring(adapter->rx_rings[i]);
-}
-
-/**
- * i40evf_clean_all_tx_rings - Free Tx Buffers for all queues
- * @adapter: board private structure
- **/
-static void i40evf_clean_all_tx_rings(struct i40evf_adapter *adapter)
-{
-	int i;
-
-	for (i = 0; i < adapter->num_active_queues; i++)
-		i40evf_clean_tx_ring(adapter->tx_rings[i]);
-}
-
-/**
  * i40e_down - Shutdown the connection processing
  * @adapter: board private structure
  **/
@@ -1008,9 +984,6 @@ void i40evf_down(struct i40evf_adapter *adapter)
 	i40evf_napi_disable_all(adapter);
 
 	netif_carrier_off(netdev);
-
-	i40evf_clean_all_tx_rings(adapter);
-	i40evf_clean_all_rx_rings(adapter);
 }
 
 /**
-- 
1.9.3

^ permalink raw reply related

* [net-next 03/15] i40evf: Remove some scary log messages
From: Jeff Kirsher @ 2015-01-13 11:33 UTC (permalink / raw)
  To: davem; +Cc: Mitch A Williams, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1421148811-9763-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch A Williams <mitch.a.williams@intel.com>

These messages may be triggered during normal init of the driver if the
PF or FW take a long time to respond. There's nothing really wrong, so
don't freak people out logging messages.

If the communication channel really is dead, then we'll retry a few
times and give up. This will log a different more scary message that
should cause consternation. This allows the user to more easily detect a
genuine failure.

Change-ID: I6e2b758d4234a3a09c1015c82c8f2442a697cbdb
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c     | 4 ----
 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c | 3 ---
 2 files changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index ee0db59..f8f1d26 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2026,10 +2026,7 @@ static void i40evf_init_task(struct work_struct *work)
 		/* aq msg sent, awaiting reply */
 		err = i40evf_verify_api_ver(adapter);
 		if (err) {
-			dev_info(&pdev->dev, "Unable to verify API version (%d), retrying\n",
-				 err);
 			if (err == I40E_ERR_ADMIN_QUEUE_NO_WORK) {
-				dev_info(&pdev->dev, "Resending request\n");
 				err = i40evf_send_api_ver(adapter);
 			}
 			goto err;
@@ -2054,7 +2051,6 @@ static void i40evf_init_task(struct work_struct *work)
 		}
 		err = i40evf_get_vf_config(adapter);
 		if (err == I40E_ERR_ADMIN_QUEUE_NO_WORK) {
-			dev_info(&pdev->dev, "Resending VF config request\n");
 			err = i40evf_send_vf_config_msg(adapter);
 			goto err;
 		}
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index 5fde5a7..3aeb633 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -715,9 +715,6 @@ void i40evf_virtchnl_completion(struct i40evf_adapter *adapter,
 		}
 		return;
 	}
-	if (v_opcode != adapter->current_op)
-		dev_info(&adapter->pdev->dev, "Pending op is %d, received %d\n",
-			 adapter->current_op, v_opcode);
 	if (v_retval) {
 		dev_err(&adapter->pdev->dev, "%s: PF returned error %d to our request %d\n",
 			__func__, v_retval, v_opcode);
-- 
1.9.3

^ permalink raw reply related

* [net-next 01/15] i40e: disable IOV before freeing resources
From: Jeff Kirsher @ 2015-01-13 11:33 UTC (permalink / raw)
  To: davem; +Cc: Mitch A Williams, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1421148811-9763-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch A Williams <mitch.a.williams@intel.com>

If VF drivers are loaded in the host OS, the call to pci_disable_sriov()
will cause these drivers' remove routines to be called. If the PF driver
has already freed VF resources before this happens, then the VF remove
routine can't properly communicate with the PF driver causing all sorts
of mayhem and error messages and hurt feelings.

To fix this, we move the call to pci_disable_sriov() up to the top of
the function and let it complete before freeing any VF resources.

Change-ID: I397c3997a00f6408e32b7735273911e499600236
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 5bae895..044019b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -791,10 +791,18 @@ void i40e_free_vfs(struct i40e_pf *pf)
 	if (!pf->vf)
 		return;
 
+	/* Disable IOV before freeing resources. This lets any VF drivers
+	 * running in the host get themselves cleaned up before we yank
+	 * the carpet out from underneath their feet.
+	 */
+	if (!pci_vfs_assigned(pf->pdev))
+		pci_disable_sriov(pf->pdev);
+
+	msleep(20); /* let any messages in transit get finished up */
+
 	/* Disable interrupt 0 so we don't try to handle the VFLR. */
 	i40e_irq_dynamic_disable_icr0(pf);
 
-	mdelay(10); /* let any messages in transit get finished up */
 	/* free up vf resources */
 	tmp = pf->num_alloc_vfs;
 	pf->num_alloc_vfs = 0;
@@ -813,7 +821,6 @@ void i40e_free_vfs(struct i40e_pf *pf)
 	 * before this function ever gets called.
 	 */
 	if (!pci_vfs_assigned(pf->pdev)) {
-		pci_disable_sriov(pf->pdev);
 		/* Acknowledge VFLR for all VFS. Without this, VFs will fail to
 		 * work correctly when SR-IOV gets re-enabled.
 		 */
-- 
1.9.3

^ permalink raw reply related

* [net-next 05/15] i40evf: remove leftover VLAN filters
From: Jeff Kirsher @ 2015-01-13 11:33 UTC (permalink / raw)
  To: davem; +Cc: Mitch A Williams, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1421148811-9763-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch A Williams <mitch.a.williams@intel.com>

If we're using VLANs and communications with the PF fail during
shutdown, we will leak memory because not all of the VLAN filters will
be removed. To eliminate this possibility, go through the list again
right before the module is removed and delete any leftover entries.

Change-ID: Id3b5315c47ca0a61ae123a96ff345d010bc41aed
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 5a2ed90..a21fb88 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2463,6 +2463,10 @@ static void i40evf_remove(struct pci_dev *pdev)
 		list_del(&f->list);
 		kfree(f);
 	}
+	list_for_each_entry_safe(f, ftmp, &adapter->vlan_filter_list, list) {
+		list_del(&f->list);
+		kfree(f);
+	}
 
 	free_netdev(netdev);
 
-- 
1.9.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox