Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] rhashtable: Add MAINTAINERS entry
From: David Miller @ 2015-01-13  5:25 UTC (permalink / raw)
  To: tgraf; +Cc: netdev
In-Reply-To: <cf5552c116d0fe998a3b660d5850b7c2efd814b5.1421107185.git.tgraf@suug.ch>

From: Thomas Graf <tgraf@suug.ch>
Date: Tue, 13 Jan 2015 01:01:24 +0100

> Signed-off-by: Thomas Graf <tgraf@suug.ch>

Applied.

^ permalink raw reply

* [PATCH v2] gianfar: correct the bad expression while writing bit-pattern
From: Sanjeev Sharma @ 2015-01-13  5:28 UTC (permalink / raw)
  To: davem
  Cc: claudiu.manoil, peter.senna, shemminger, netdev, linux-kernel,
	Sanjeev Sharma, Sanjeev Sharma
In-Reply-To: <54B3D63B.1010608@cogentembedded.com>

This patch correct the bad expression while writing the
bit-pattern from software's buffer to hardware registers.

Signed-off-by: Sanjeev Sharma <Sanjeev_Sharma@mentor.com>
---
Changes in v2:
  - incorporated review comment as per Sergei. 

 drivers/net/ethernet/freescale/gianfar_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 3e1a9c1..347d5ee 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -1586,7 +1586,7 @@ static int gfar_write_filer_table(struct gfar_private *priv,
 		return -EBUSY;
 
 	/* Fill regular entries */
-	for (; i < MAX_FILER_IDX - 1 && (tab->fe[i].ctrl | tab->fe[i].ctrl);
+	for (; i < MAX_FILER_IDX - 1 && i < tab->fe[i].ctrl);
 	     i++)
 		gfar_write_filer(priv, i, tab->fe[i].ctrl, tab->fe[i].prop);
 	/* Fill the rest with fall-troughs */
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCHv2 net-next] openvswitch: Introduce ovs_tunnel_route_lookup
From: Pravin Shelar @ 2015-01-13  5:52 UTC (permalink / raw)
  To: Fan Du; +Cc: dev@openvswitch.org, netdev, fengyuleidian0615
In-Reply-To: <1421116883-26839-1-git-send-email-fan.du@intel.com>

On Mon, Jan 12, 2015 at 6:41 PM, Fan Du <fan.du@intel.com> wrote:
> Introduce ovs_tunnel_route_lookup to consolidate route lookup
> shared by vxlan, gre, and geneve ports.
>
> Signed-off-by: Fan Du <fan.du@intel.com>
> ---
> Chnage log:
> v2:
>   - Use inline instead of function call
>   - Rename vport_route_lookup to ovs_tunnel_route_lookup
> ---
>  net/openvswitch/vport-geneve.c |   11 +----------
>  net/openvswitch/vport-gre.c    |   10 +---------
>  net/openvswitch/vport-vxlan.c  |   10 +---------
>  net/openvswitch/vport.h        |   18 ++++++++++++++++++
>  4 files changed, 21 insertions(+), 28 deletions(-)
>
....

> +static inline struct rtable *ovs_tunnel_route_lookup(struct net *net,
> +                                                    struct ovs_key_ipv4_tunnel *key,
> +                                                    struct sk_buff *skb,
> +                                                    struct flowi4 *fl,
> +                                                    u8 protocol)
> +{
> +       struct rtable *rt;
> +
> +       memset(fl, 0, sizeof(*fl));
> +       fl->daddr = key->ipv4_dst;
> +       fl->saddr = key->ipv4_src;
> +       fl->flowi4_tos = RT_TOS(key->ipv4_tos);
> +       fl->flowi4_mark = skb->mark;
> +       fl->flowi4_proto = protocol;
> +
> +       rt = ip_route_output_key(net, fl);
> +       return rt;
> +}

ovs_tunnel_get_egress_info() is also directly calling ip_route_output_key()

>  #endif /* vport.h */
> --
> 1.7.1
>

^ permalink raw reply

* Re: [PATCH v2] gianfar: correct the bad expression while writing bit-pattern
From: Eric Dumazet @ 2015-01-13  6:01 UTC (permalink / raw)
  To: Sanjeev Sharma
  Cc: davem, claudiu.manoil, peter.senna, shemminger, netdev,
	linux-kernel
In-Reply-To: <1421126938-16268-1-git-send-email-sanjeev_sharma@mentor.com>

On Tue, 2015-01-13 at 10:58 +0530, Sanjeev Sharma wrote:
> This patch correct the bad expression while writing the
> bit-pattern from software's buffer to hardware registers.
> 
> Signed-off-by: Sanjeev Sharma <Sanjeev_Sharma@mentor.com>
> ---
> Changes in v2:
>   - incorporated review comment as per Sergei. 
> 
>  drivers/net/ethernet/freescale/gianfar_ethtool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> index 3e1a9c1..347d5ee 100644
> --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
> +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> @@ -1586,7 +1586,7 @@ static int gfar_write_filer_table(struct gfar_private *priv,
>  		return -EBUSY;
>  
>  	/* Fill regular entries */
> -	for (; i < MAX_FILER_IDX - 1 && (tab->fe[i].ctrl | tab->fe[i].ctrl);
> +	for (; i < MAX_FILER_IDX - 1 && i < tab->fe[i].ctrl);
>  	     i++)
>  		gfar_write_filer(priv, i, tab->fe[i].ctrl, tab->fe[i].prop);
>  	/* Fill the rest with fall-troughs */

This makes no sense. Have you tried to compile this ?

^ permalink raw reply

* Re: [PATCH v2] gianfar: correct the bad expression while writing bit-pattern
From: Eric Dumazet @ 2015-01-13  6:15 UTC (permalink / raw)
  To: Sanjeev Sharma
  Cc: davem, claudiu.manoil, peter.senna, shemminger, netdev,
	linux-kernel
In-Reply-To: <1421128891.4099.15.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, 2015-01-12 at 22:01 -0800, Eric Dumazet wrote:
> On Tue, 2015-01-13 at 10:58 +0530, Sanjeev Sharma wrote:
> > This patch correct the bad expression while writing the
> > bit-pattern from software's buffer to hardware registers.
> > 
> > Signed-off-by: Sanjeev Sharma <Sanjeev_Sharma@mentor.com>
> > ---
> > Changes in v2:
> >   - incorporated review comment as per Sergei. 
> > 
> >  drivers/net/ethernet/freescale/gianfar_ethtool.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> > index 3e1a9c1..347d5ee 100644
> > --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
> > +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
> > @@ -1586,7 +1586,7 @@ static int gfar_write_filer_table(struct gfar_private *priv,
> >  		return -EBUSY;
> >  
> >  	/* Fill regular entries */
> > -	for (; i < MAX_FILER_IDX - 1 && (tab->fe[i].ctrl | tab->fe[i].ctrl);
> > +	for (; i < MAX_FILER_IDX - 1 && i < tab->fe[i].ctrl);
> >  	     i++)
> >  		gfar_write_filer(priv, i, tab->fe[i].ctrl, tab->fe[i].prop);
> >  	/* Fill the rest with fall-troughs */
> 
> This makes no sense. Have you tried to compile this ?

I have no idea of what this code is trying to do, but most likely
author intent was to break loop when both ctrl and prop are 0 :

diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 3e1a9c1a67a9..fda12fb32ec7 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -1586,7 +1586,7 @@ static int gfar_write_filer_table(struct gfar_private *priv,
 		return -EBUSY;
 
 	/* Fill regular entries */
-	for (; i < MAX_FILER_IDX - 1 && (tab->fe[i].ctrl | tab->fe[i].ctrl);
+	for (; i < MAX_FILER_IDX - 1 && (tab->fe[i].ctrl | tab->fe[i].prop);
 	     i++)
 		gfar_write_filer(priv, i, tab->fe[i].ctrl, tab->fe[i].prop);
 	/* Fill the rest with fall-troughs */

^ permalink raw reply related

* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
From: Michael Chan @ 2015-01-13  6:49 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Prashant Sreedharan, netdev, Linux kernel
In-Reply-To: <54B46DD5.9050802@hurleysoftware.com>

On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote: 
> [   17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
> [   17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
> [   17.203092] 2 locks held by ip/1106:
> [   17.205255]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
> [   17.207445]  #1:  (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
> [   17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
> [   17.211900] Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
> [   17.214086]  0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
> [   17.216265]  ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
> [   17.218446]  ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
> [   17.220636] Call Trace:
> [   17.222743]  [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
> [   17.224808]  [<ffffffff810998a5>] ___might_sleep+0x105/0x140
> [   17.226842]  [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
> [   17.228869]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
> [   17.230939]  [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
> [   17.232967]  [<ffffffffa00ed000>] ? 0xffffffffa00ed000
> [   17.234991]  [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
> [   17.236988]  [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3] 

tp->lock is held in this code path.  If synchronize_irq() sleeps in
wait_event(desc->wait_for_threads, ...), we'll get the warning.

The synchronize_irq() call is to wait for any tg3 irq handler to finish
so that it is guaranteed that next time it will see the CHIP_RESETTING
flag and do nothing.

Not sure if we can drop the tp->lock before we call synchronize_irq()
and then take it again after synchronize_irq().

^ permalink raw reply

* Re: why are IPv6 addresses removed on link down
From: Stephen Hemminger @ 2015-01-13  7:10 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev@vger.kernel.org
In-Reply-To: <54B4A7E4.7030301@gmail.com>

On Mon, 12 Jan 2015 22:06:44 -0700
David Ahern <dsahern@gmail.com> wrote:

> We noticed that IPv6 addresses are removed on a link down. e.g.,
>    ip link set dev eth1
> 
> 
> Looking at the code it appears to be this code path in addrconf.c:
> 
>          case NETDEV_DOWN:
>          case NETDEV_UNREGISTER:
>                  /*
>                   *      Remove all addresses from this interface.
>                   */
>                  addrconf_ifdown(dev, event != NETDEV_DOWN);
>                  break;
> 
> IPv4 addresses are NOT removed on a link down. Is there a particular 
> reason IPv6 addresses are?
> 
> Thanks,
> David

See RFC's which describes how IPv6 does Duplicate Address Detection.
Address is not valid when link is down, since DAD is not possible.

^ permalink raw reply

* Re: Fwd: [rhashtable] WARNING: CPU: 0 PID: 10 at kernel/locking/mutex.c:570 mutex_lock_nested()
From: Ying Xue @ 2015-01-13  7:50 UTC (permalink / raw)
  To: Thomas Graf; +Cc: linux-kernel, lkp, Netdev
In-Reply-To: <20150112124216.GA26570@casper.infradead.org>

On 01/12/2015 08:42 PM, Thomas Graf wrote:
> On 01/12/15 at 09:38am, Ying Xue wrote:
>> Hi Thomas,
>>
>> I am really unable to see where is wrong leading to below warning
>> complaints. Can you please help me check it?
> 
> Not sure yet. It's not your patch that introduced the issue though.
> It merely exposed the affected code path.
>
> Just wondering, did you test with CONFIG_DEBUG_MUTEXES enabled?
> 
> 

After I enable above option, I don't find similar complaints during my
testing.

Regards,
Ying

^ permalink raw reply

* [PATCH] neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
From: Jean-Francois Remy @ 2015-01-13  7:51 UTC (permalink / raw)
  To: netdev; +Cc: Jean-Francois Remy

When setting base_reachable_time or base_reachable_time_ms through
sysctl or netlink, the reachable_time value is not updated.
This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
    recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.

This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.

Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.

Signed-off-by: Jean-Francois Remy <jeff@melix.org>
---
 net/core/neighbour.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8e38f17..b6d1d46 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2043,6 +2043,13 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh)
 			case NDTPA_BASE_REACHABLE_TIME:
 				NEIGH_VAR_SET(p, BASE_REACHABLE_TIME,
 					      nla_get_msecs(tbp[i]));
+				/*
+				 * update reachable_time as well, otherwise, the change will
+				 * only be effective after the next time neigh_periodic_work
+				 * decides to recompute it (can be multiple minutes)
+				 */
+				p->reachable_time =
+					neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
 				break;
 			case NDTPA_GC_STALETIME:
 				NEIGH_VAR_SET(p, GC_STALETIME,
@@ -2921,6 +2928,32 @@ static int neigh_proc_dointvec_unres_qlen(struct ctl_table *ctl, int write,
 	return ret;
 }
 
+static int neigh_proc_base_reachable_time(struct ctl_table *ctl, int write,
+					  void __user *buffer,
+					  size_t *lenp, loff_t *ppos)
+{
+	struct neigh_parms *p = ctl->extra2;
+	int ret;
+
+	if (strcmp(ctl->procname, "base_reachable_time") == 0)
+		ret = neigh_proc_dointvec_jiffies(ctl, write, buffer, lenp, ppos);
+	else if (strcmp(ctl->procname, "base_reachable_time_ms") == 0)
+		ret = neigh_proc_dointvec_ms_jiffies(ctl, write, buffer, lenp, ppos);
+	else
+		ret = -1;
+
+	if (write && ret == 0 ) {
+		/*
+		 * update reachable_time as well, otherwise, the change will
+		 * only be effective after the next time neigh_periodic_work
+		 * decides to recompute it
+		 */
+		p->reachable_time =
+			neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
+	}
+	return ret;
+}
+
 #define NEIGH_PARMS_DATA_OFFSET(index)	\
 	(&((struct neigh_parms *) 0)->data[index])
 
@@ -3047,6 +3080,20 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 		t->neigh_vars[NEIGH_VAR_RETRANS_TIME_MS].proc_handler = handler;
 		/* ReachableTime (in milliseconds) */
 		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].proc_handler = handler;
+	} else {
+		/*
+		 * Those handlers will update p->reachable_time after
+		 * base_reachable_time(_ms) is set to ensure the new timer starts being
+		 * applied after the next neighbour update instead of waiting for
+		 * neigh_periodic_work to update its value (can be multiple minutes)
+		 * So any handler that replaces them should do this as well
+		 */
+		/* ReachableTime */
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME].proc_handler =
+			neigh_proc_base_reachable_time;
+		/* ReachableTime (in milliseconds) */
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].proc_handler =
+			neigh_proc_base_reachable_time;
 	}
 
 	/* Don't export sysctls to unprivileged users */
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next v11 3/3] net: hisilicon: new hip04 ethernet driver
From: Ding Tianhong @ 2015-01-13  7:55 UTC (permalink / raw)
  To: Alexander Graf, arnd, robh+dt, davem, grant.likely
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <54B41192.3030400@suse.de>

On 2015/1/13 2:25, Alexander Graf wrote:
> On 12.01.15 09:03, Ding Tianhong wrote:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>>
>> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
>> is not supported for this patch, but I think it could work for hip04,
>> will support it later after some tests for performance better.
>>
>> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>>
>> - Before:
>> $ ping 192.168.1.1 ...
>> --- 192.168.1.1 ping statistics ---
> 
> Writing --- directly into your patch description is usually a pretty bad
> idea. Git am cuts off everything that comes after --- so your patch
> description ends here without manual intervention ;).
> 
>> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>>
>> $ iperf -c 192.168.1.1 ...
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
>>
>> - After:
>> $ ping 192.168.1.1 ...
>> --- 192.168.1.1 ping statistics ---
>> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>>
>> $ iperf -c 192.168.1.1 ...
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec
>>
>> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
> 
> Version history however should go after a --- line, so that it doesn't
> show up in the patch description in the tree.
> 

ok

>> for v9 version
>> - drop the workqueue
>> - batch cleanup based on tx_coalesce_frames/usecs for better throughput
>> - use a reasonable default tx timeout (200us, could be shorted
>>   based on measurements) with a range timer
>> - fix napi poll function return value
>> - use a lockless queue for cleanup
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
> 
> 
> [...]
> 
>> +static int hip04_remove(struct platform_device *pdev)
>> +{
>> +	struct net_device *ndev = platform_get_drvdata(pdev);
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	struct device *d = &pdev->dev;
>> +
>> +	if (priv->phy)
>> +		phy_disconnect(priv->phy);
>> +
>> +	hip04_free_ring(ndev, d);
>> +	unregister_netdev(ndev);
>> +	free_irq(ndev->irq, ndev);
>> +	of_node_put(priv->phy_node);
>> +	cancel_work_sync(&priv->tx_timeout_task);
>> +	free_netdev(ndev);
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct of_device_id hip04_mac_match[] = {
>> +	{ .compatible = "hisilicon,hip04-mac" },
>> +	{ }
>> +};
> 
> This is missing
> 
> MODULE_DEVICE_TABLE(of, hip04_mac_match);
> 
> to enable automatic module loading, no?
> 
looks good to me, thanks.

Ding

> 
> Alex
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Nicolas Dichtel @ 2015-01-13  8:29 UTC (permalink / raw)
  To: David Miller
  Cc: tgraf, jesse, stephen, pshelar, therbert, alexei.starovoitov,
	netdev, dev
In-Reply-To: <20150112.125929.266649587719513897.davem@davemloft.net>

Le 12/01/2015 18:59, David Miller a écrit :
>
> Can you PLEASE, PLEASE, not quote and entire full patch just to add two
> lines of commentary.
>
> Quote _only_ the _RELEVANT_ portions of the email you are replying to.
>
Will do, sorry for that.

^ permalink raw reply

* Re: Fwd: [rhashtable] WARNING: CPU: 0 PID: 10 at kernel/locking/mutex.c:570 mutex_lock_nested()
From: Thomas Graf @ 2015-01-13  8:41 UTC (permalink / raw)
  To: Ying Xue; +Cc: linux-kernel, lkp, Netdev
In-Reply-To: <54B4CE3E.8020009@windriver.com>

On 01/13/15 at 03:50pm, Ying Xue wrote:
> On 01/12/2015 08:42 PM, Thomas Graf wrote:
> > On 01/12/15 at 09:38am, Ying Xue wrote:
> >> Hi Thomas,
> >>
> >> I am really unable to see where is wrong leading to below warning
> >> complaints. Can you please help me check it?
> > 
> > Not sure yet. It's not your patch that introduced the issue though.
> > It merely exposed the affected code path.
> >
> > Just wondering, did you test with CONFIG_DEBUG_MUTEXES enabled?
> > 
> > 
> 
> After I enable above option, I don't find similar complaints during my
> testing.

I can't reproduce it in my KVM box either so far. It looks like a
mutex_lock() on an uninitialized mutex or use after free but I can't
find such a code path so far.

^ permalink raw reply

* [PATCH net-next] rhashtable: unnecessary to use delayed work
From: Ying Xue @ 2015-01-13  9:00 UTC (permalink / raw)
  To: tgraf; +Cc: davem, netdev

When we put our declared work task in the global workqueue with
schedule_delayed_work(), its delay parameter is always zero.
Therefore, we should define a normal work in rhashtable structure
instead of a delayed work.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/linux/rhashtable.h |    2 +-
 lib/rhashtable.c           |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 9570832..a2562ed 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -119,7 +119,7 @@ struct rhashtable {
 	atomic_t			nelems;
 	atomic_t			shift;
 	struct rhashtable_params	p;
-	struct delayed_work             run_work;
+	struct work_struct		run_work;
 	struct mutex                    mutex;
 	bool                            being_destroyed;
 };
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index ed6ae1a..a7959ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -476,7 +476,7 @@ static void rht_deferred_worker(struct work_struct *work)
 	struct rhashtable *ht;
 	struct bucket_table *tbl;
 
-	ht = container_of(work, struct rhashtable, run_work.work);
+	ht = container_of(work, struct rhashtable, run_work);
 	mutex_lock(&ht->mutex);
 	tbl = rht_dereference(ht->tbl, ht);
 
@@ -498,7 +498,7 @@ static void rhashtable_wakeup_worker(struct rhashtable *ht)
 	if (tbl == new_tbl &&
 	    ((ht->p.grow_decision && ht->p.grow_decision(ht, size)) ||
 	     (ht->p.shrink_decision && ht->p.shrink_decision(ht, size))))
-		schedule_delayed_work(&ht->run_work, 0);
+		schedule_work(&ht->run_work);
 }
 
 static void __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj,
@@ -894,7 +894,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
 		get_random_bytes(&ht->p.hash_rnd, sizeof(ht->p.hash_rnd));
 
 	if (ht->p.grow_decision || ht->p.shrink_decision)
-		INIT_DEFERRABLE_WORK(&ht->run_work, rht_deferred_worker);
+		INIT_WORK(&ht->run_work, rht_deferred_worker);
 
 	return 0;
 }
@@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht)
 
 	mutex_lock(&ht->mutex);
 
-	cancel_delayed_work(&ht->run_work);
+	cancel_work_sync(&ht->run_work);
 	bucket_table_free(rht_dereference(ht->tbl, ht));
 
 	mutex_unlock(&ht->mutex);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next] tipc: remove redundant timer defined in tipc_sock struct
From: Ying Xue @ 2015-01-13  9:07 UTC (permalink / raw)
  To: davem; +Cc: jon.maloy, netdev, ericalp, Paul.Gortmaker, tipc-discussion

Remove the redundant timer defined in tipc_sock structure, instead we
can directly reuse the sk_timer defined in sock structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 2cec496..c9c34a6 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -69,7 +69,6 @@
  * @pub_count: total # of publications port has made during its lifetime
  * @probing_state:
  * @probing_intv:
- * @timer:
  * @port: port - interacts with 'sk' and with the rest of the TIPC stack
  * @peer_name: the peer of the connection, if any
  * @conn_timeout: the time we can wait for an unresponded setup request
@@ -94,7 +93,6 @@ struct tipc_sock {
 	u32 pub_count;
 	u32 probing_state;
 	unsigned long probing_intv;
-	struct timer_list timer;
 	uint conn_timeout;
 	atomic_t dupl_rcvcnt;
 	bool link_cong;
@@ -360,7 +358,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
 		return -EINVAL;
 	}
 	msg_set_origport(msg, tsk->portid);
-	setup_timer(&tsk->timer, tipc_sk_timeout, (unsigned long)tsk);
+	setup_timer(&sk->sk_timer, tipc_sk_timeout, (unsigned long)tsk);
 	sk->sk_backlog_rcv = tipc_backlog_rcv;
 	sk->sk_rcvbuf = sysctl_tipc_rmem[1];
 	sk->sk_data_ready = tipc_data_ready;
@@ -514,7 +512,8 @@ static int tipc_release(struct socket *sock)
 
 	tipc_sk_withdraw(tsk, 0, NULL);
 	probing_state = tsk->probing_state;
-	if (del_timer_sync(&tsk->timer) && probing_state != TIPC_CONN_PROBING)
+	if (del_timer_sync(&sk->sk_timer) &&
+	    probing_state != TIPC_CONN_PROBING)
 		sock_put(sk);
 	tipc_sk_remove(tsk);
 	if (tsk->connected) {
@@ -1136,7 +1135,8 @@ static int tipc_send_packet(struct kiocb *iocb, struct socket *sock,
 static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
 				u32 peer_node)
 {
-	struct net *net = sock_net(&tsk->sk);
+	struct sock *sk = &tsk->sk;
+	struct net *net = sock_net(sk);
 	struct tipc_msg *msg = &tsk->phdr;
 
 	msg_set_destnode(msg, peer_node);
@@ -1148,8 +1148,7 @@ static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
 	tsk->probing_intv = CONN_PROBING_INTERVAL;
 	tsk->probing_state = TIPC_CONN_OK;
 	tsk->connected = 1;
-	if (!mod_timer(&tsk->timer, jiffies + tsk->probing_intv))
-		sock_hold(&tsk->sk);
+	sk_reset_timer(sk, &sk->sk_timer, jiffies + tsk->probing_intv);
 	tipc_node_add_conn(net, peer_node, tsk->portid, peer_port);
 	tsk->max_pkt = tipc_node_get_mtu(net, peer_node, tsk->portid);
 }
@@ -2141,8 +2140,7 @@ static void tipc_sk_timeout(unsigned long data)
 				      0, peer_node, tn->own_addr,
 				      peer_port, tsk->portid, TIPC_OK);
 		tsk->probing_state = TIPC_CONN_PROBING;
-		if (!mod_timer(&tsk->timer, jiffies + tsk->probing_intv))
-			sock_hold(sk);
+		sk_reset_timer(sk, &sk->sk_timer, jiffies + tsk->probing_intv);
 	}
 	bh_unlock_sock(sk);
 	if (skb)
-- 
1.7.9.5


------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

^ permalink raw reply related

* [PATCH net-next v12 0/3] add hisilicon hip04 ethernet driver
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd-r2nGTMty4D4, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	agraf-l3A5Bk7waGM
  Cc: sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ

v12:
- According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
  for hip04 ethernet.

v11:
- Add ethtool support for tx coalecse getting and setting, the xmit_more
  is not supported for this patch, but I think it could work for hip04,
  will support it later after some tests for performance better.

  Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
  it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

  - Before:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

  - After:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10:
- According Arnd's suggestion, remove the skb_orphan and use the hrtimer
  for the cleanup of the TX queue and add some modification for the hip04
  drivers.
  1) drop the broken skb_orphan call
  2) drop the workqueue
  3) batch cleanup based on tx_coalesce_frames/usecs for better throughput
  4) use a reasonable default tx timeout (200us, could be shorted
     based on measurements) with a range timer
  5) fix napi poll function return value
  6) use a lockless queue for cleanup

v9:
- There is no tx completion interrupts to free DMAd Tx packets, it means taht
  we rely on new tx packets arriving to run the destructors of completed packets,
  which open up space in their sockets's send queues. Sometimes we don't get such
  new packets causing Tx to stall, a single UDP transmitter is a good example of
  this situation, so we need a clean up workqueue to reclaims completed packets,
  the workqueue will only free the last packets which is already stay for several jiffies.
  Also fix some format cleanups.

v8:
- Use poll to reclaim xmitted buffer as workaround since no tx done interrupt 

v7:
- Remove select NET_CORE in 0002

v6:
- Suggest by Russell: Use netdev_sent_queue & netdev_completed_queue to solve latency issue 
  Also shorten the period of timer, which is used to wakeup the queue since no
  tx completed interrupt.

v5:
- no big change, fix typo

v4:
- Modify accoringly to the suggetion from Arnd, Florian, Eric, David
  Use of_parse_phandle_with_fixed_args & syscon_node_to_regmap get ppe info
  Add skb_orphan() and tx_timer for reclaim since no tx_finished interrupt
  Update timeout, and move of_phy_connect to probe to reuse open/stop

v3:
- Suggest from Arnd, use syscon & regmap_write/read to replace static void __iomem *ppebase.
  Modify hisilicon-hip04-net.txt accrordingly to suggestion from Florian and Sergei.

v2:
- Got many suggestions from Russell, Arnd, Florian, Mark and Sergei
  Remove memcpy, use dma_map/unmap_single, use dma_alloc_coherent rather than dma_pool, etc.
  Refer property in ethernet.txt, change ppe description, etc.

Ding Tianhong (1):
  net: hisilicon: new hip04 ethernet driver

Zhangfei Gao (2):
  Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  net: hisilicon: new hip04 MDIO driver

 .../bindings/net/hisilicon-hip04-net.txt           |  88 ++
 drivers/net/ethernet/hisilicon/Kconfig             |   9 +
 drivers/net/ethernet/hisilicon/Makefile            |   1 +
 drivers/net/ethernet/hisilicon/hip04_eth.c         | 968 +++++++++++++++++++++
 drivers/net/ethernet/hisilicon/hip04_mdio.c        | 186 ++++
 5 files changed, 1252 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v12 2/3] net: hisilicon: new hip04 MDIO driver
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421140290-5492-1-git-send-email-dingtianhong@huawei.com>

From: Zhangfei Gao <zhangfei.gao@linaro.org>

Hisilicon hip04 platform mdio driver
Reuse Marvell phy drivers/net/phy/marvell.c

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/Kconfig      |   9 ++
 drivers/net/ethernet/hisilicon/Makefile     |   1 +
 drivers/net/ethernet/hisilicon/hip04_mdio.c | 186 ++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
index e942173..a54d897 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -24,4 +24,13 @@ config HIX5HD2_GMAC
 	help
 	  This selects the hix5hd2 mac family network device.
 
+config HIP04_ETH
+	tristate "HISILICON P04 Ethernet support"
+	select PHYLIB
+	select MARVELL_PHY
+	select MFD_SYSCON
+	---help---
+	  If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
+	  want to use the internal ethernet then you should answer Y to this.
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 9175e846..40115a7 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
new file mode 100644
index 0000000..b3bac25
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
@@ -0,0 +1,186 @@
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/of_mdio.h>
+#include <linux/delay.h>
+
+#define MDIO_CMD_REG		0x0
+#define MDIO_ADDR_REG		0x4
+#define MDIO_WDATA_REG		0x8
+#define MDIO_RDATA_REG		0xc
+#define MDIO_STA_REG		0x10
+
+#define MDIO_START		BIT(14)
+#define MDIO_R_VALID		BIT(1)
+#define MDIO_READ	        (BIT(12) | BIT(11) | MDIO_START)
+#define MDIO_WRITE	        (BIT(12) | BIT(10) | MDIO_START)
+
+struct hip04_mdio_priv {
+	void __iomem *base;
+};
+
+#define WAIT_TIMEOUT 10
+static int hip04_mdio_wait_ready(struct mii_bus *bus)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	int i;
+
+	for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
+		if (i == WAIT_TIMEOUT)
+			return -ETIMEDOUT;
+		msleep(20);
+	}
+
+	return 0;
+}
+
+static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = regnum | (mii_id << 5) | MDIO_READ;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = readl_relaxed(priv->base + MDIO_STA_REG);
+	if (val & MDIO_R_VALID) {
+		dev_err(bus->parent, "SMI bus read not valid\n");
+		ret = -ENODEV;
+		goto out;
+	}
+
+	val = readl_relaxed(priv->base + MDIO_RDATA_REG);
+	ret = val & 0xFFFF;
+out:
+	return ret;
+}
+
+static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
+			    int regnum, u16 value)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	writel_relaxed(value, priv->base + MDIO_WDATA_REG);
+	val = regnum | (mii_id << 5) | MDIO_WRITE;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+out:
+	return ret;
+}
+
+static int hip04_mdio_reset(struct mii_bus *bus)
+{
+	int temp, i;
+
+	for (i = 0; i < PHY_MAX_ADDR; i++) {
+		hip04_mdio_write(bus, i, 22, 0);
+		temp = hip04_mdio_read(bus, i, MII_BMCR);
+		if (temp < 0)
+			continue;
+
+		temp |= BMCR_RESET;
+		if (hip04_mdio_write(bus, i, MII_BMCR, temp) < 0)
+			continue;
+	}
+
+	mdelay(500);
+	return 0;
+}
+
+static int hip04_mdio_probe(struct platform_device *pdev)
+{
+	struct resource *r;
+	struct mii_bus *bus;
+	struct hip04_mdio_priv *priv;
+	int ret;
+
+	bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
+	if (!bus) {
+		dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
+		return -ENOMEM;
+	}
+
+	bus->name = "hip04_mdio_bus";
+	bus->read = hip04_mdio_read;
+	bus->write = hip04_mdio_write;
+	bus->reset = hip04_mdio_reset;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(&pdev->dev));
+	bus->parent = &pdev->dev;
+	priv = bus->priv;
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(&pdev->dev, r);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto out_mdio;
+	}
+
+	ret = of_mdiobus_register(bus, pdev->dev.of_node);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
+		goto out_mdio;
+	}
+
+	platform_set_drvdata(pdev, bus);
+
+	return 0;
+
+out_mdio:
+	mdiobus_free(bus);
+	return ret;
+}
+
+static int hip04_mdio_remove(struct platform_device *pdev)
+{
+	struct mii_bus *bus = platform_get_drvdata(pdev);
+
+	mdiobus_unregister(bus);
+	mdiobus_free(bus);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mdio_match[] = {
+	{ .compatible = "hisilicon,hip04-mdio" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, hip04_mdio_match);
+
+static struct platform_driver hip04_mdio_driver = {
+	.probe = hip04_mdio_probe,
+	.remove = hip04_mdio_remove,
+	.driver = {
+		.name = "hip04-mdio",
+		.owner = THIS_MODULE,
+		.of_match_table = hip04_mdio_match,
+	},
+};
+
+module_platform_driver(hip04_mdio_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-mdio");
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v12 3/3] net: hisilicon: new hip04 ethernet driver
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421140290-5492-1-git-send-email-dingtianhong@huawei.com>

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
The controller has no tx done interrupt, reclaim xmitted buffer in the poll.

v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
     for hip04 ethernet.

v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
     is not supported for this patch, but I think it could work for hip04,
     will support it later after some tests for performance better.

     Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
     it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

     - Before:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

     - After:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10: According David Miller and Arnd Bergmann's suggestion, add some modification
     for v9 version
     - drop the workqueue
     - batch cleanup based on tx_coalesce_frames/usecs for better throughput
     - use a reasonable default tx timeout (200us, could be shorted
       based on measurements) with a range timer
     - fix napi poll function return value
     - use a lockless queue for cleanup

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/Makefile    |   2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c | 968 +++++++++++++++++++++++++++++
 2 files changed, 969 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 40115a7..6c14540 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,4 +3,4 @@
 #
 
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..a50530d
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,968 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/ktime.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x08
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_REG		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_CPU_ADD_ADDR		0x580
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define RCV_DROP			BIT(7)
+#define TX_DROP				BIT(6)
+#define DEF_INT_ERR			(RCV_NOBUF | RCV_DROP | TX_DROP)
+#define DEF_INT_MASK			(RCV_INT | DEF_INT_ERR)
+
+/* TX descriptor config */
+#define TX_FREE_MEM			BIT(0)
+#define TX_READ_ALLOC_L3		BIT(1)
+#define TX_FINISH_CACHE_INV		BIT(2)
+#define TX_CLEAR_WB			BIT(4)
+#define TX_L3_CHECKSUM			BIT(5)
+#define TX_LOOP_BACK			BIT(11)
+
+/* RX error */
+#define RX_PKT_DROP			BIT(0)
+#define RX_L2_ERR			BIT(1)
+#define RX_PKT_ERR			(RX_PKT_DROP | RX_L2_ERR)
+
+#define SGMII_SPEED_1000		0x08
+#define SGMII_SPEED_100			0x07
+#define SGMII_SPEED_10			0x06
+#define MII_SPEED_100			0x01
+#define MII_SPEED_10			0x00
+
+#define GE_DUPLEX_FULL			BIT(0)
+#define GE_DUPLEX_HALF			0x00
+#define GE_MODE_CHANGE_EN		BIT(0)
+
+#define GE_TX_AUTO_NEG			BIT(5)
+#define GE_TX_ADD_CRC			BIT(6)
+#define GE_TX_SHORT_PAD_THROUGH		BIT(7)
+
+#define GE_RX_STRIP_CRC			BIT(0)
+#define GE_RX_STRIP_PAD			BIT(3)
+#define GE_RX_PAD_EN			BIT(4)
+
+#define GE_AUTO_NEG_CTL			BIT(0)
+
+#define GE_RX_INT_THRESHOLD		BIT(6)
+#define GE_RX_TIMEOUT			0x04
+
+#define GE_RX_PORT_EN			BIT(1)
+#define GE_TX_PORT_EN			BIT(2)
+
+#define PPE_CFG_STS_RX_PKT_CNT_RC	BIT(12)
+
+#define PPE_CFG_RX_PKT_ALIGN		BIT(18)
+#define PPE_CFG_QOS_VMID_MODE		BIT(14)
+#define PPE_CFG_QOS_VMID_GRP_SHIFT	8
+
+#define PPE_CFG_RX_FIFO_FSFU		BIT(11)
+#define PPE_CFG_RX_DEPTH_SHIFT		16
+#define PPE_CFG_RX_START_SHIFT		0
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT	11
+
+#define PPE_CFG_BUS_LOCAL_REL		BIT(14)
+#define PPE_CFG_BUS_BIG_ENDIEN		BIT(0)
+
+#define RX_DESC_NUM			128
+#define TX_DESC_NUM			256
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define GMAC_MIN_PKT_LEN		31
+#define RX_BUF_SIZE			1600
+#define RESET_TIMEOUT			1000
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+#define DRV_VERSION			"v1.0"
+
+#define HIP04_MAX_TX_COALESCE_USECS	200
+#define HIP04_MIN_TX_COALESCE_USECS	100
+#define HIP04_MAX_TX_COALESCE_FRAMES	200
+#define HIP04_MIN_TX_COALESCE_FRAMES	100
+
+struct tx_desc {
+	u32 send_addr;
+	u32 send_size;
+	u32 next_addr;
+	u32 cfg;
+	u32 wb_addr;
+} __aligned(64);
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	unsigned int tx_head;
+
+	int tx_coalesce_frames;
+	int tx_coalesce_usecs;
+	struct hrtimer tx_coalesce_timer;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct work_struct tx_timeout_task;
+
+	/* written only by tx cleanup */
+	unsigned int tx_tail ____cacheline_aligned_in_smp;
+};
+
+static inline unsigned int tx_count(unsigned int head, unsigned int tail)
+{
+	return (head - tail) % (TX_DESC_NUM - 1);
+}
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = SGMII_SPEED_1000;
+		else if (speed == SPEED_100)
+			val = SGMII_SPEED_100;
+		else
+			val = SGMII_SPEED_10;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = MII_SPEED_100;
+		else
+			val = MII_SPEED_10;
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = MII_SPEED_10;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = GE_MODE_CHANGE_EN;
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp, timeout = 0;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+		if (timeout++ > RESET_TIMEOUT)
+			break;
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= PPE_CFG_STS_RX_PKT_CNT_RC;
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
+	val |= PPE_CFG_QOS_VMID_MODE;
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
+	val |= PPE_CFG_RX_FIFO_FSFU;
+	val |= priv->chan << PPE_CFG_RX_START_SHIFT;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	val = PPE_CFG_RX_PKT_ALIGN;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	val = GMAC_PPE_RX_PKT_MAX_LEN;
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	val = GMAC_MAX_PKT_LEN;
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	val = GMAC_MIN_PKT_LEN;
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	val = GE_RX_STRIP_CRC;
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	val = GE_AUTO_NEG_CTL;
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(DEF_INT_MASK);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+		       priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+		       priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static int hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+	unsigned int count;
+
+	smp_rmb();
+	count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
+	if (count == 0)
+		goto out;
+
+	while (count) {
+		desc = &priv->tx_desc[tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+					 priv->tx_skb[tx_tail]->len,
+					 DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		count--;
+	}
+
+	priv->tx_tail = tx_tail;
+	smp_wmb(); /* Ensure tx_tail visible to xmit */
+
+out:
+	if (pkts_compl || bytes_compl)
+		netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
+		netif_wake_queue(ndev);
+
+	return count;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head, count;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	smp_rmb();
+	count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
+	if (count == (TX_DESC_NUM - 1)) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be32(skb->len);
+	desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	count++;
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+
+	/* Ensure tx_head update visible to tx reclaim */
+	smp_wmb();
+
+	/* queue is getting full, better start cleaning up now */
+	if (count >= priv->tx_coalesce_frames) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt and timer */
+			priv->reg_inten &= ~(RCV_INT);
+			writel_relaxed(DEF_INT_MASK & ~RCV_INT,
+				       priv->base + PPE_INTEN);
+			hrtimer_cancel(&priv->tx_coalesce_timer);
+			__napi_schedule(&priv->napi);
+		}
+	} else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
+		/* cleanup not pending yet, start a new timer */
+		hrtimer_start_expires(&priv->tx_coalesce_timer,
+				      HRTIMER_MODE_REL);
+	}
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	int tx_remaining;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				 RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			goto done;
+		phys = dma_map_single(&ndev->dev, buf,
+				      RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			goto done;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			goto done;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (!(priv->reg_inten & RCV_INT)) {
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+	napi_complete(napi);
+done:
+	/* clean up tx descriptors and start a new timer if necessary */
+	tx_remaining = hip04_tx_reclaim(ndev, false);
+	if (rx < budget && tx_remaining)
+		hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *)dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	if (!ists)
+		return IRQ_NONE;
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (unlikely(ists & DEF_INT_ERR)) {
+		if (ists & (RCV_NOBUF | RCV_DROP))
+			stats->rx_errors++;
+			stats->rx_dropped++;
+			netdev_err(ndev, "rx drop\n");
+		if (ists & TX_DROP) {
+			stats->tx_dropped++;
+			netdev_err(ndev, "tx drop\n");
+		}
+	}
+
+	if (ists & RCV_INT && napi_schedule_prep(&priv->napi)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT);
+		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+		hrtimer_cancel(&priv->tx_coalesce_timer);
+		__napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+enum hrtimer_restart tx_done(struct hrtimer *hrtimer)
+{
+	struct hip04_priv *priv;
+	priv = container_of(hrtimer, struct hip04_priv, tx_coalesce_timer);
+
+	if (napi_schedule_prep(&priv->napi)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT);
+		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+		__napi_schedule(&priv->napi);
+	}
+
+	return HRTIMER_NORESTART;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	hip04_reset_ppe(priv);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				      RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netdev_reset_queue(ndev);
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					 RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	schedule_work(&priv->tx_timeout_task);
+}
+
+static void hip04_tx_timeout_task(struct work_struct *work)
+{
+	struct hip04_priv *priv;
+
+	priv = container_of(work, struct hip04_priv, tx_timeout_task);
+	hip04_mac_stop(priv->ndev);
+	hip04_mac_open(priv->ndev);
+}
+
+static struct net_device_stats *hip04_get_stats(struct net_device *ndev)
+{
+	return &ndev->stats;
+}
+
+static int hip04_get_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct hip04_priv *priv = netdev_priv(netdev);
+
+	ec->tx_coalesce_usecs = priv->tx_coalesce_usecs;
+	ec->tx_max_coalesced_frames = priv->tx_coalesce_frames;
+
+	return 0;
+}
+
+static int hip04_set_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct hip04_priv *priv = netdev_priv(netdev);
+
+	/* Check not supported parameters  */
+	if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
+	    (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
+	    (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
+	    (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
+	    (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
+	    (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
+	    (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
+	    (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
+	    (ec->tx_max_coalesced_frames_irq) ||
+	    (ec->stats_block_coalesce_usecs) ||
+	    (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
+		return -EOPNOTSUPP;
+
+	if ((ec->tx_coalesce_usecs > HIP04_MAX_TX_COALESCE_USECS ||
+	     ec->tx_coalesce_usecs < HIP04_MIN_TX_COALESCE_USECS) ||
+	    (ec->tx_max_coalesced_frames > HIP04_MAX_TX_COALESCE_FRAMES ||
+	     ec->tx_max_coalesced_frames < HIP04_MIN_TX_COALESCE_FRAMES))
+		return -EINVAL;
+
+	priv->tx_coalesce_usecs = ec->tx_coalesce_usecs;
+	priv->tx_coalesce_frames = ec->tx_max_coalesced_frames;
+
+	return 0;
+}
+
+static void hip04_get_drvinfo(struct net_device *netdev,
+			      struct ethtool_drvinfo *drvinfo)
+{
+	strlcpy(drvinfo->driver, DRV_NAME, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, DRV_VERSION, sizeof(drvinfo->version));
+}
+
+static struct ethtool_ops hip04_ethtool_ops = {
+	.get_coalesce		= hip04_get_coalesce,
+	.set_coalesce		= hip04_set_coalesce,
+	.get_drvinfo		= hip04_get_drvinfo,
+};
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_get_stats		= hip04_get_stats,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			  priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	ktime_t txtime;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1] * RX_DESC_NUM;
+
+	hrtimer_init(&priv->tx_coalesce_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+	/*
+	 * BQL will try to keep the TX queue as short as possible, but it can't
+	 * be faster than tx_coalesce_usecs, so we need a fast timeout here,
+	 * but also long enough to gather up enough frames to ensure we don't
+	 * get more interrupts than necessary.
+	 * 200us is enough for 16 frames of 1500 bytes at gigabit ethernet rate
+	 */
+	priv->tx_coalesce_frames = TX_DESC_NUM * 3 / 4;
+	priv->tx_coalesce_usecs = 200;
+	/* allow timer to fire after half the time at the earliest */
+	txtime = ktime_set(0, priv->tx_coalesce_usecs * NSEC_PER_USEC / 2);
+	hrtimer_set_expires_range(&priv->tx_coalesce_timer, txtime, txtime);
+	priv->tx_coalesce_timer.function = tx_done;
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+			       0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	INIT_WORK(&priv->tx_timeout_task, hip04_tx_timeout_task);
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->ethtool_ops = &hip04_ethtool_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, NAPI_POLL_WEIGHT);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	cancel_work_sync(&priv->tx_timeout_task);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(of, hip04_mac_match);
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v12 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
From: Ding Tianhong @ 2015-01-13  9:11 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421140290-5492-1-git-send-email-dingtianhong@huawei.com>

From: Zhangfei Gao <zhangfei.gao@linaro.org>

This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 .../bindings/net/hisilicon-hip04-net.txt           | 88 ++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
new file mode 100644
index 0000000..988fc69
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -0,0 +1,88 @@
+Hisilicon hip04 Ethernet Controller
+
+* Ethernet controller node
+
+Required properties:
+- compatible: should be "hisilicon,hip04-mac".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device.
+- port-handle: <phandle port channel>
+	phandle, specifies a reference to the syscon ppe node
+	port, port number connected to the controller
+	channel, recv channel start from channel * number (RX_DESC_NUM)
+- phy-mode: see ethernet.txt [1].
+
+Optional properties:
+- phy-handle: see ethernet.txt [1].
+
+[1] Documentation/devicetree/bindings/net/ethernet.txt
+
+
+* Ethernet ppe node:
+Control rx & tx fifos of all ethernet controllers.
+Have 2048 recv channels shared by all ethernet controllers, only if no overlap.
+Each controller's recv channel start from channel * number (RX_DESC_NUM).
+
+Required properties:
+- compatible: "hisilicon,hip04-ppe", "syscon".
+- reg: address and length of the register set for the device.
+
+
+* MDIO bus node:
+
+Required properties:
+
+- compatible: should be "hisilicon,hip04-mdio".
+- Inherits from MDIO bus node binding [2]
+[2] Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+	mdio {
+		compatible = "hisilicon,hip04-mdio";
+		reg = <0x28f1000 0x1000>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		phy0: ethernet-phy@0 {
+			compatible = "ethernet-phy-ieee802.3-c22";
+			reg = <0>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+		};
+
+		phy1: ethernet-phy@1 {
+			compatible = "ethernet-phy-ieee802.3-c22";
+			reg = <1>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+		};
+	};
+
+	ppe: ppe@28c0000 {
+		compatible = "hisilicon,hip04-ppe", "syscon";
+		reg = <0x28c0000 0x10000>;
+	};
+
+	fe: ethernet@28b0000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x28b0000 0x10000>;
+		interrupts = <0 413 4>;
+		phy-mode = "mii";
+		port-handle = <&ppe 31 0>;
+	};
+
+	ge0: ethernet@2800000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2800000 0x10000>;
+		interrupts = <0 402 4>;
+		phy-mode = "sgmii";
+		port-handle = <&ppe 0 1>;
+		phy-handle = <&phy0>;
+	};
+
+	ge8: ethernet@2880000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2880000 0x10000>;
+		interrupts = <0 410 4>;
+		phy-mode = "sgmii";
+		port-handle = <&ppe 8 2>;
+		phy-handle = <&phy1>;
+	};
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH net-next] rhashtable: unnecessary to use delayed work
From: Thomas Graf @ 2015-01-13  9:35 UTC (permalink / raw)
  To: Ying Xue; +Cc: davem, netdev
In-Reply-To: <1421139645-1588-1-git-send-email-ying.xue@windriver.com>

On 01/13/15 at 05:00pm, Ying Xue wrote:
> When we put our declared work task in the global workqueue with
> schedule_delayed_work(), its delay parameter is always zero.
> Therefore, we should define a normal work in rhashtable structure
> instead of a delayed work.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>

> @@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht)
>  
>  	mutex_lock(&ht->mutex);
>  
> -	cancel_delayed_work(&ht->run_work);
> +	cancel_work_sync(&ht->run_work);
>  	bucket_table_free(rht_dereference(ht->tbl, ht));
>  
>  	mutex_unlock(&ht->mutex);

I like the patch!

I think it introduces a possible dead lock though (see below). OTOH, it
could actually explain the reason for the 0day lock debug splash that
was reported.

Dead lock: The worker could already have been kicked off but was
interrupted before it acquired ht->mutex. rhashtable_destroy() is
called and acquired ht->mutex. cancel_work_sync() waits for worker to
finish while holding ht->mutex. Worker can't finish because it needs to
acquire ht->mutex to do so.

For the very same reason the reported warning could have been triggered.
Instead of the dead lock, it would have called bucket_table_free()
with a deferred resizer still underway.

What about we do something like this?

void rhashtable_destroy(struct rhashtable *ht)
{
        ht->being_destroyed = true;
	cancel_work_sync(&ht->run_work);

	mutex_lock(&ht->mutex);
	bucket_table_free(rht_dereference(ht->tbl, ht));
	mutex_unlock(&ht->mutex);
}

If you agree we can explain this shortly in the commit message and add:
Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")

^ permalink raw reply

* [PATCH net-next] cxgb4: Ripping out old hard-wired initialization code in driver
From: Hariprasad Shenai @ 2015-01-13  9:49 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, Hariprasad Shenai

Removing old hard-wired initialization code in the driver, which is no longer
used. Also deprecating few module parameters.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  480 +++--------------------
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |   98 +----
 2 files changed, 58 insertions(+), 520 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 23ae0b7..082a596 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -79,99 +79,6 @@
 #define DRV_VERSION "2.0.0-ko"
 #define DRV_DESC "Chelsio T4/T5 Network Driver"
 
-/*
- * Max interrupt hold-off timer value in us.  Queues fall back to this value
- * under extreme memory pressure so it's largish to give the system time to
- * recover.
- */
-#define MAX_SGE_TIMERVAL 200U
-
-enum {
-	/*
-	 * Physical Function provisioning constants.
-	 */
-	PFRES_NVI = 4,			/* # of Virtual Interfaces */
-	PFRES_NETHCTRL = 128,		/* # of EQs used for ETH or CTRL Qs */
-	PFRES_NIQFLINT = 128,		/* # of ingress Qs/w Free List(s)/intr
-					 */
-	PFRES_NEQ = 256,		/* # of egress queues */
-	PFRES_NIQ = 0,			/* # of ingress queues */
-	PFRES_TC = 0,			/* PCI-E traffic class */
-	PFRES_NEXACTF = 128,		/* # of exact MPS filters */
-
-	PFRES_R_CAPS = FW_CMD_CAP_PF,
-	PFRES_WX_CAPS = FW_CMD_CAP_PF,
-
-#ifdef CONFIG_PCI_IOV
-	/*
-	 * Virtual Function provisioning constants.  We need two extra Ingress
-	 * Queues with Interrupt capability to serve as the VF's Firmware
-	 * Event Queue and Forwarded Interrupt Queue (when using MSI mode) --
-	 * neither will have Free Lists associated with them).  For each
-	 * Ethernet/Control Egress Queue and for each Free List, we need an
-	 * Egress Context.
-	 */
-	VFRES_NPORTS = 1,		/* # of "ports" per VF */
-	VFRES_NQSETS = 2,		/* # of "Queue Sets" per VF */
-
-	VFRES_NVI = VFRES_NPORTS,	/* # of Virtual Interfaces */
-	VFRES_NETHCTRL = VFRES_NQSETS,	/* # of EQs used for ETH or CTRL Qs */
-	VFRES_NIQFLINT = VFRES_NQSETS+2,/* # of ingress Qs/w Free List(s)/intr */
-	VFRES_NEQ = VFRES_NQSETS*2,	/* # of egress queues */
-	VFRES_NIQ = 0,			/* # of non-fl/int ingress queues */
-	VFRES_TC = 0,			/* PCI-E traffic class */
-	VFRES_NEXACTF = 16,		/* # of exact MPS filters */
-
-	VFRES_R_CAPS = FW_CMD_CAP_DMAQ|FW_CMD_CAP_VF|FW_CMD_CAP_PORT,
-	VFRES_WX_CAPS = FW_CMD_CAP_DMAQ|FW_CMD_CAP_VF,
-#endif
-};
-
-/*
- * Provide a Port Access Rights Mask for the specified PF/VF.  This is very
- * static and likely not to be useful in the long run.  We really need to
- * implement some form of persistent configuration which the firmware
- * controls.
- */
-static unsigned int pfvfres_pmask(struct adapter *adapter,
-				  unsigned int pf, unsigned int vf)
-{
-	unsigned int portn, portvec;
-
-	/*
-	 * Give PF's access to all of the ports.
-	 */
-	if (vf == 0)
-		return FW_PFVF_CMD_PMASK_M;
-
-	/*
-	 * For VFs, we'll assign them access to the ports based purely on the
-	 * PF.  We assign active ports in order, wrapping around if there are
-	 * fewer active ports than PFs: e.g. active port[pf % nports].
-	 * Unfortunately the adapter's port_info structs haven't been
-	 * initialized yet so we have to compute this.
-	 */
-	if (adapter->params.nports == 0)
-		return 0;
-
-	portn = pf % adapter->params.nports;
-	portvec = adapter->params.portvec;
-	for (;;) {
-		/*
-		 * Isolate the lowest set bit in the port vector.  If we're at
-		 * the port number that we want, return that as the pmask.
-		 * otherwise mask that bit out of the port vector and
-		 * decrement our port number ...
-		 */
-		unsigned int pmask = portvec ^ (portvec & (portvec-1));
-		if (portn == 0)
-			return pmask;
-		portn--;
-		portvec &= ~pmask;
-	}
-	/*NOTREACHED*/
-}
-
 enum {
 	MAX_TXQ_ENTRIES      = 16384,
 	MAX_CTRL_TXQ_ENTRIES = 1024,
@@ -264,7 +171,8 @@ MODULE_PARM_DESC(force_init, "Forcibly become Master PF and initialize adapter")
 static uint force_old_init;
 
 module_param(force_old_init, uint, 0644);
-MODULE_PARM_DESC(force_old_init, "Force old initialization sequence");
+MODULE_PARM_DESC(force_old_init, "Force old initialization sequence, deprecated"
+		 " parameter");
 
 static int dflt_msg_enable = DFLT_MSG_ENABLE;
 
@@ -293,13 +201,14 @@ static unsigned int intr_holdoff[SGE_NTIMERS - 1] = { 5, 10, 20, 50, 100 };
 
 module_param_array(intr_holdoff, uint, NULL, 0644);
 MODULE_PARM_DESC(intr_holdoff, "values for queue interrupt hold-off timers "
-		 "0..4 in microseconds");
+		 "0..4 in microseconds, deprecated parameter");
 
 static unsigned int intr_cnt[SGE_NCOUNTERS - 1] = { 4, 8, 16 };
 
 module_param_array(intr_cnt, uint, NULL, 0644);
 MODULE_PARM_DESC(intr_cnt,
-		 "thresholds 1..3 for queue interrupt packet counters");
+		 "thresholds 1..3 for queue interrupt packet counters, "
+		 "deprecated parameter");
 
 /*
  * Normally we tell the chip to deliver Ingress Packets into our DMA buffers
@@ -319,7 +228,8 @@ static bool vf_acls;
 
 #ifdef CONFIG_PCI_IOV
 module_param(vf_acls, bool, 0644);
-MODULE_PARM_DESC(vf_acls, "if set enable virtualization L2 ACL enforcement");
+MODULE_PARM_DESC(vf_acls, "if set enable virtualization L2 ACL enforcement, "
+		 "deprecated parameter");
 
 /* Configure the number of PCI-E Virtual Function which are to be instantiated
  * on SR-IOV Capable Physical Functions.
@@ -341,32 +251,11 @@ module_param(select_queue, int, 0644);
 MODULE_PARM_DESC(select_queue,
 		 "Select between kernel provided method of selecting or driver method of selecting TX queue. Default is kernel method.");
 
-/*
- * The filter TCAM has a fixed portion and a variable portion.  The fixed
- * portion can match on source/destination IP IPv4/IPv6 addresses and TCP/UDP
- * ports.  The variable portion is 36 bits which can include things like Exact
- * Match MAC Index (9 bits), Ether Type (16 bits), IP Protocol (8 bits),
- * [Inner] VLAN Tag (17 bits), etc. which, if all were somehow selected, would
- * far exceed the 36-bit budget for this "compressed" header portion of the
- * filter.  Thus, we have a scarce resource which must be carefully managed.
- *
- * By default we set this up to mostly match the set of filter matching
- * capabilities of T3 but with accommodations for some of T4's more
- * interesting features:
- *
- *   { IP Fragment (1), MPS Match Type (3), IP Protocol (8),
- *     [Inner] VLAN (17), Port (3), FCoE (1) }
- */
-enum {
-	TP_VLAN_PRI_MAP_DEFAULT = HW_TPL_FR_MT_PR_IV_P_FC,
-	TP_VLAN_PRI_MAP_FIRST = FCOE_S,
-	TP_VLAN_PRI_MAP_LAST = FRAGMENTATION_S,
-};
-
-static unsigned int tp_vlan_pri_map = TP_VLAN_PRI_MAP_DEFAULT;
+static unsigned int tp_vlan_pri_map = HW_TPL_FR_MT_PR_IV_P_FC;
 
 module_param(tp_vlan_pri_map, uint, 0644);
-MODULE_PARM_DESC(tp_vlan_pri_map, "global compressed filter configuration");
+MODULE_PARM_DESC(tp_vlan_pri_map, "global compressed filter configuration, "
+		 "deprecated parameter");
 
 static struct dentry *cxgb4_debugfs_root;
 
@@ -5225,12 +5114,9 @@ static int adap_init0_config(struct adapter *adapter, int reset)
 	if (ret < 0)
 		goto bye;
 
-	/*
-	 * Return successfully and note that we're operating with parameters
-	 * not supplied by the driver, rather than from hard-wired
-	 * initialization constants burried in the driver.
+	/* Emit Firmware Configuration File information and return
+	 * successfully.
 	 */
-	adapter->flags |= USING_SOFT_PARAMS;
 	dev_info(adapter->pdev_dev, "Successfully configured using Firmware "\
 		 "Configuration File \"%s\", version %#x, computed checksum %#x\n",
 		 config_name, finiver, cfcsum);
@@ -5248,248 +5134,6 @@ bye:
 	return ret;
 }
 
-/*
- * Attempt to initialize the adapter via hard-coded, driver supplied
- * parameters ...
- */
-static int adap_init0_no_config(struct adapter *adapter, int reset)
-{
-	struct sge *s = &adapter->sge;
-	struct fw_caps_config_cmd caps_cmd;
-	u32 v;
-	int i, ret;
-
-	/*
-	 * Reset device if necessary
-	 */
-	if (reset) {
-		ret = t4_fw_reset(adapter, adapter->mbox,
-				  PIORSTMODE_F | PIORST_F);
-		if (ret < 0)
-			goto bye;
-	}
-
-	/*
-	 * Get device capabilities and select which we'll be using.
-	 */
-	memset(&caps_cmd, 0, sizeof(caps_cmd));
-	caps_cmd.op_to_write = htonl(FW_CMD_OP_V(FW_CAPS_CONFIG_CMD) |
-				     FW_CMD_REQUEST_F | FW_CMD_READ_F);
-	caps_cmd.cfvalid_to_len16 = htonl(FW_LEN16(caps_cmd));
-	ret = t4_wr_mbox(adapter, adapter->mbox, &caps_cmd, sizeof(caps_cmd),
-			 &caps_cmd);
-	if (ret < 0)
-		goto bye;
-
-	if (caps_cmd.niccaps & htons(FW_CAPS_CONFIG_NIC_VM)) {
-		if (!vf_acls)
-			caps_cmd.niccaps ^= htons(FW_CAPS_CONFIG_NIC_VM);
-		else
-			caps_cmd.niccaps = htons(FW_CAPS_CONFIG_NIC_VM);
-	} else if (vf_acls) {
-		dev_err(adapter->pdev_dev, "virtualization ACLs not supported");
-		goto bye;
-	}
-	caps_cmd.op_to_write = htonl(FW_CMD_OP_V(FW_CAPS_CONFIG_CMD) |
-			      FW_CMD_REQUEST_F | FW_CMD_WRITE_F);
-	ret = t4_wr_mbox(adapter, adapter->mbox, &caps_cmd, sizeof(caps_cmd),
-			 NULL);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Tweak configuration based on system architecture, module
-	 * parameters, etc.
-	 */
-	ret = adap_init0_tweaks(adapter);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Select RSS Global Mode we want to use.  We use "Basic Virtual"
-	 * mode which maps each Virtual Interface to its own section of
-	 * the RSS Table and we turn on all map and hash enables ...
-	 */
-	adapter->flags |= RSS_TNLALLLOOKUP;
-	ret = t4_config_glbl_rss(adapter, adapter->mbox,
-				 FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL,
-				 FW_RSS_GLB_CONFIG_CMD_TNLMAPEN_F |
-				 FW_RSS_GLB_CONFIG_CMD_HASHTOEPLITZ_F |
-				 ((adapter->flags & RSS_TNLALLLOOKUP) ?
-					FW_RSS_GLB_CONFIG_CMD_TNLALLLKP_F : 0));
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Set up our own fundamental resource provisioning ...
-	 */
-	ret = t4_cfg_pfvf(adapter, adapter->mbox, adapter->fn, 0,
-			  PFRES_NEQ, PFRES_NETHCTRL,
-			  PFRES_NIQFLINT, PFRES_NIQ,
-			  PFRES_TC, PFRES_NVI,
-			  FW_PFVF_CMD_CMASK_M,
-			  pfvfres_pmask(adapter, adapter->fn, 0),
-			  PFRES_NEXACTF,
-			  PFRES_R_CAPS, PFRES_WX_CAPS);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Perform low level SGE initialization.  We need to do this before we
-	 * send the firmware the INITIALIZE command because that will cause
-	 * any other PF Drivers which are waiting for the Master
-	 * Initialization to proceed forward.
-	 */
-	for (i = 0; i < SGE_NTIMERS - 1; i++)
-		s->timer_val[i] = min(intr_holdoff[i], MAX_SGE_TIMERVAL);
-	s->timer_val[SGE_NTIMERS - 1] = MAX_SGE_TIMERVAL;
-	s->counter_val[0] = 1;
-	for (i = 1; i < SGE_NCOUNTERS; i++)
-		s->counter_val[i] = min(intr_cnt[i - 1], THRESHOLD_0_M);
-	t4_sge_init(adapter);
-
-#ifdef CONFIG_PCI_IOV
-	/*
-	 * Provision resource limits for Virtual Functions.  We currently
-	 * grant them all the same static resource limits except for the Port
-	 * Access Rights Mask which we're assigning based on the PF.  All of
-	 * the static provisioning stuff for both the PF and VF really needs
-	 * to be managed in a persistent manner for each device which the
-	 * firmware controls.
-	 */
-	{
-		int pf, vf;
-
-		for (pf = 0; pf < ARRAY_SIZE(num_vf); pf++) {
-			if (num_vf[pf] <= 0)
-				continue;
-
-			/* VF numbering starts at 1! */
-			for (vf = 1; vf <= num_vf[pf]; vf++) {
-				ret = t4_cfg_pfvf(adapter, adapter->mbox,
-						  pf, vf,
-						  VFRES_NEQ, VFRES_NETHCTRL,
-						  VFRES_NIQFLINT, VFRES_NIQ,
-						  VFRES_TC, VFRES_NVI,
-						  FW_PFVF_CMD_CMASK_M,
-						  pfvfres_pmask(
-						  adapter, pf, vf),
-						  VFRES_NEXACTF,
-						  VFRES_R_CAPS, VFRES_WX_CAPS);
-				if (ret < 0)
-					dev_warn(adapter->pdev_dev,
-						 "failed to "\
-						 "provision pf/vf=%d/%d; "
-						 "err=%d\n", pf, vf, ret);
-			}
-		}
-	}
-#endif
-
-	/*
-	 * Set up the default filter mode.  Later we'll want to implement this
-	 * via a firmware command, etc. ...  This needs to be done before the
-	 * firmare initialization command ...  If the selected set of fields
-	 * isn't equal to the default value, we'll need to make sure that the
-	 * field selections will fit in the 36-bit budget.
-	 */
-	if (tp_vlan_pri_map != TP_VLAN_PRI_MAP_DEFAULT) {
-		int j, bits = 0;
-
-		for (j = TP_VLAN_PRI_MAP_FIRST; j <= TP_VLAN_PRI_MAP_LAST; j++)
-			switch (tp_vlan_pri_map & (1 << j)) {
-			case 0:
-				/* compressed filter field not enabled */
-				break;
-			case FCOE_F:
-				bits +=  1;
-				break;
-			case PORT_F:
-				bits +=  3;
-				break;
-			case VNIC_F:
-				bits += 17;
-				break;
-			case VLAN_F:
-				bits += 17;
-				break;
-			case TOS_F:
-				bits +=  8;
-				break;
-			case PROTOCOL_F:
-				bits +=  8;
-				break;
-			case ETHERTYPE_F:
-				bits += 16;
-				break;
-			case MACMATCH_F:
-				bits +=  9;
-				break;
-			case MPSHITTYPE_F:
-				bits +=  3;
-				break;
-			case FRAGMENTATION_F:
-				bits +=  1;
-				break;
-			}
-
-		if (bits > 36) {
-			dev_err(adapter->pdev_dev,
-				"tp_vlan_pri_map=%#x needs %d bits > 36;"\
-				" using %#x\n", tp_vlan_pri_map, bits,
-				TP_VLAN_PRI_MAP_DEFAULT);
-			tp_vlan_pri_map = TP_VLAN_PRI_MAP_DEFAULT;
-		}
-	}
-	v = tp_vlan_pri_map;
-	t4_write_indirect(adapter, TP_PIO_ADDR_A, TP_PIO_DATA_A,
-			  &v, 1, TP_VLAN_PRI_MAP_A);
-
-	/*
-	 * We need Five Tuple Lookup mode to be set in TP_GLOBAL_CONFIG order
-	 * to support any of the compressed filter fields above.  Newer
-	 * versions of the firmware do this automatically but it doesn't hurt
-	 * to set it here.  Meanwhile, we do _not_ need to set Lookup Every
-	 * Packet in TP_INGRESS_CONFIG to support matching non-TCP packets
-	 * since the firmware automatically turns this on and off when we have
-	 * a non-zero number of filters active (since it does have a
-	 * performance impact).
-	 */
-	if (tp_vlan_pri_map)
-		t4_set_reg_field(adapter, TP_GLOBAL_CONFIG_A,
-				 FIVETUPLELOOKUP_V(FIVETUPLELOOKUP_M),
-				 FIVETUPLELOOKUP_V(FIVETUPLELOOKUP_M));
-
-	/*
-	 * Tweak some settings.
-	 */
-	t4_write_reg(adapter, TP_SHIFT_CNT_A, SYNSHIFTMAX_V(6) |
-		     RXTSHIFTMAXR1_V(4) | RXTSHIFTMAXR2_V(15) |
-		     PERSHIFTBACKOFFMAX_V(8) | PERSHIFTMAX_V(8) |
-		     KEEPALIVEMAXR1_V(4) | KEEPALIVEMAXR2_V(9));
-
-	/*
-	 * Get basic stuff going by issuing the Firmware Initialize command.
-	 * Note that this _must_ be after all PFVF commands ...
-	 */
-	ret = t4_fw_initialize(adapter, adapter->mbox);
-	if (ret < 0)
-		goto bye;
-
-	/*
-	 * Return successfully!
-	 */
-	dev_info(adapter->pdev_dev, "Successfully configured using built-in "\
-		 "driver parameters\n");
-	return 0;
-
-	/*
-	 * Something bad happened.  Return the error ...
-	 */
-bye:
-	return ret;
-}
-
 static struct fw_info fw_info_array[] = {
 	{
 		.chip = CHELSIO_T4,
@@ -5662,88 +5306,58 @@ static int adap_init0(struct adapter *adap)
 	adap->params.nports = hweight32(port_vec);
 	adap->params.portvec = port_vec;
 
-	/*
-	 * If the firmware is initialized already (and we're not forcing a
-	 * master initialization), note that we're living with existing
-	 * adapter parameters.  Otherwise, it's time to try initializing the
-	 * adapter ...
+	/* If the firmware is initialized already, emit a simply note to that
+	 * effect. Otherwise, it's time to try initializing the adapter.
 	 */
 	if (state == DEV_STATE_INIT) {
 		dev_info(adap->pdev_dev, "Coming up as %s: "\
 			 "Adapter already initialized\n",
 			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
-		adap->flags |= USING_SOFT_PARAMS;
 	} else {
 		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
 			 "Initializing adapter\n");
-		/*
-		 * If the firmware doesn't support Configuration
-		 * Files warn user and exit,
+
+		/* Find out whether we're dealing with a version of the
+		 * firmware which has configuration file support.
 		 */
-		if (ret < 0)
-			dev_warn(adap->pdev_dev, "Firmware doesn't support "
-				 "configuration file.\n");
-		if (force_old_init)
-			ret = adap_init0_no_config(adap, reset);
-		else {
-			/*
-			 * Find out whether we're dealing with a version of
-			 * the firmware which has configuration file support.
-			 */
-			params[0] = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DEV) |
-				     FW_PARAMS_PARAM_X_V(
-					     FW_PARAMS_PARAM_DEV_CF));
-			ret = t4_query_params(adap, adap->mbox, adap->fn, 0, 1,
-					      params, val);
-
-			/*
-			 * If the firmware doesn't support Configuration
-			 * Files, use the old Driver-based, hard-wired
-			 * initialization.  Otherwise, try using the
-			 * Configuration File support and fall back to the
-			 * Driver-based initialization if there's no
-			 * Configuration File found.
-			 */
-			if (ret < 0)
-				ret = adap_init0_no_config(adap, reset);
-			else {
-				/*
-				 * The firmware provides us with a memory
-				 * buffer where we can load a Configuration
-				 * File from the host if we want to override
-				 * the Configuration File in flash.
-				 */
+		params[0] = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DEV) |
+			     FW_PARAMS_PARAM_X_V(FW_PARAMS_PARAM_DEV_CF));
+		ret = t4_query_params(adap, adap->mbox, adap->fn, 0, 1,
+				      params, val);
 
-				ret = adap_init0_config(adap, reset);
-				if (ret == -ENOENT) {
-					dev_info(adap->pdev_dev,
-					    "No Configuration File present "
-					    "on adapter. Using hard-wired "
-					    "configuration parameters.\n");
-					ret = adap_init0_no_config(adap, reset);
-				}
-			}
+		/* If the firmware doesn't support Configuration Files,
+		 * return an error.
+		 */
+		if (ret < 0) {
+			dev_err(adap->pdev_dev, "firmware doesn't support "
+				"Firmware Configuration Files\n");
+			goto bye;
+		}
+
+		/* The firmware provides us with a memory buffer where we can
+		 * load a Configuration File from the host if we want to
+		 * override the Configuration File in flash.
+		 */
+		ret = adap_init0_config(adap, reset);
+		if (ret == -ENOENT) {
+			dev_err(adap->pdev_dev, "no Configuration File "
+				"present on adapter.\n");
+			goto bye;
 		}
 		if (ret < 0) {
-			dev_err(adap->pdev_dev,
-				"could not initialize adapter, error %d\n",
-				-ret);
+			dev_err(adap->pdev_dev, "could not initialize "
+				"adapter, error %d\n", -ret);
 			goto bye;
 		}
 	}
 
-	/*
-	 * If we're living with non-hard-coded parameters (either from a
-	 * Firmware Configuration File or values programmed by a different PF
-	 * Driver), give the SGE code a chance to pull in anything that it
-	 * needs ...  Note that this must be called after we retrieve our VPD
-	 * parameters in order to know how to convert core ticks to seconds.
+	/* Give the SGE code a chance to pull in anything that it needs ...
+	 * Note that this must be called after we retrieve our VPD parameters
+	 * in order to know how to convert core ticks to seconds, etc.
 	 */
-	if (adap->flags & USING_SOFT_PARAMS) {
-		ret = t4_sge_init(adap);
-		if (ret < 0)
-			goto bye;
-	}
+	ret = t4_sge_init(adap);
+	if (ret < 0)
+		goto bye;
 
 	if (is_bypass_device(adap->pdev->device))
 		adap->params.bypass = 1;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index a79fa6a..ca42e2e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2742,24 +2742,11 @@ void t4_sge_stop(struct adapter *adap)
 }
 
 /**
- *	t4_sge_init - initialize SGE
+ *	t4_sge_init_soft - grab core SGE values needed by SGE code
  *	@adap: the adapter
  *
- *	Performs SGE initialization needed every time after a chip reset.
- *	We do not initialize any of the queues here, instead the driver
- *	top-level must request them individually.
- *
- *	Called in two different modes:
- *
- *	 1. Perform actual hardware initialization and record hard-coded
- *	    parameters which were used.  This gets used when we're the
- *	    Master PF and the Firmware Configuration File support didn't
- *	    work for some reason.
- *
- *	 2. We're not the Master PF or initialization was performed with
- *	    a Firmware Configuration File.  In this case we need to grab
- *	    any of the SGE operating parameters that we need to have in
- *	    order to do our job and make sure we can live with them ...
+ *	We need to grab the SGE operating parameters that we need to have
+ *	in order to do our job and make sure we can live with them.
  */
 
 static int t4_sge_init_soft(struct adapter *adap)
@@ -2852,73 +2839,13 @@ static int t4_sge_init_soft(struct adapter *adap)
 	return 0;
 }
 
-static int t4_sge_init_hard(struct adapter *adap)
-{
-	struct sge *s = &adap->sge;
-
-	/*
-	 * Set up our basic SGE mode to deliver CPL messages to our Ingress
-	 * Queue and Packet Date to the Free List.
-	 */
-	t4_set_reg_field(adap, SGE_CONTROL_A, RXPKTCPLMODE_F, RXPKTCPLMODE_F);
-
-	/*
-	 * Set up to drop DOORBELL writes when the DOORBELL FIFO overflows
-	 * and generate an interrupt when this occurs so we can recover.
-	 */
-	if (is_t4(adap->params.chip)) {
-		t4_set_reg_field(adap, SGE_DBFIFO_STATUS_A,
-				 HP_INT_THRESH_V(HP_INT_THRESH_M) |
-				 LP_INT_THRESH_V(LP_INT_THRESH_M),
-				 HP_INT_THRESH_V(dbfifo_int_thresh) |
-				 LP_INT_THRESH_V(dbfifo_int_thresh));
-	} else {
-		t4_set_reg_field(adap, SGE_DBFIFO_STATUS_A,
-				 LP_INT_THRESH_T5_V(LP_INT_THRESH_T5_M),
-				 LP_INT_THRESH_T5_V(dbfifo_int_thresh));
-		t4_set_reg_field(adap, SGE_DBFIFO_STATUS2_A,
-				 HP_INT_THRESH_T5_V(HP_INT_THRESH_T5_M),
-				 HP_INT_THRESH_T5_V(dbfifo_int_thresh));
-	}
-	t4_set_reg_field(adap, SGE_DOORBELL_CONTROL_A, ENABLE_DROP_F,
-			 ENABLE_DROP_F);
-
-	/*
-	 * SGE_FL_BUFFER_SIZE0 (RX_SMALL_PG_BUF) is set up by
-	 * t4_fixup_host_params().
-	 */
-	s->fl_pg_order = FL_PG_ORDER;
-	if (s->fl_pg_order)
-		t4_write_reg(adap,
-			     SGE_FL_BUFFER_SIZE0_A+RX_LARGE_PG_BUF*sizeof(u32),
-			     PAGE_SIZE << FL_PG_ORDER);
-	t4_write_reg(adap, SGE_FL_BUFFER_SIZE0_A+RX_SMALL_MTU_BUF*sizeof(u32),
-		     FL_MTU_SMALL_BUFSIZE(adap));
-	t4_write_reg(adap, SGE_FL_BUFFER_SIZE0_A+RX_LARGE_MTU_BUF*sizeof(u32),
-		     FL_MTU_LARGE_BUFSIZE(adap));
-
-	/*
-	 * Note that the SGE Ingress Packet Count Interrupt Threshold and
-	 * Timer Holdoff values must be supplied by our caller.
-	 */
-	t4_write_reg(adap, SGE_INGRESS_RX_THRESHOLD_A,
-		     THRESHOLD_0_V(s->counter_val[0]) |
-		     THRESHOLD_1_V(s->counter_val[1]) |
-		     THRESHOLD_2_V(s->counter_val[2]) |
-		     THRESHOLD_3_V(s->counter_val[3]));
-	t4_write_reg(adap, SGE_TIMER_VALUE_0_AND_1_A,
-		     TIMERVALUE0_V(us_to_core_ticks(adap, s->timer_val[0])) |
-		     TIMERVALUE1_V(us_to_core_ticks(adap, s->timer_val[1])));
-	t4_write_reg(adap, SGE_TIMER_VALUE_2_AND_3_A,
-		     TIMERVALUE2_V(us_to_core_ticks(adap, s->timer_val[2])) |
-		     TIMERVALUE3_V(us_to_core_ticks(adap, s->timer_val[3])));
-	t4_write_reg(adap, SGE_TIMER_VALUE_4_AND_5_A,
-		     TIMERVALUE4_V(us_to_core_ticks(adap, s->timer_val[4])) |
-		     TIMERVALUE5_V(us_to_core_ticks(adap, s->timer_val[5])));
-
-	return 0;
-}
-
+/**
+ *     t4_sge_init - initialize SGE
+ *     @adap: the adapter
+ *
+ *     Perform low-level SGE code initialization needed every time after a
+ *     chip reset.
+ */
 int t4_sge_init(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
@@ -2959,10 +2886,7 @@ int t4_sge_init(struct adapter *adap)
 		s->fl_align = max(ingpadboundary, ingpackboundary);
 	}
 
-	if (adap->flags & USING_SOFT_PARAMS)
-		ret = t4_sge_init_soft(adap);
-	else
-		ret = t4_sge_init_hard(adap);
+	ret = t4_sge_init_soft(adap);
 	if (ret < 0)
 		return ret;
 
-- 
1.7.1

^ permalink raw reply related

* RE: [PATCH net-next] rhashtable: Lower/upper bucket may map to same lock while shrinking
From: David Laight @ 2015-01-13  9:49 UTC (permalink / raw)
  To: 'Thomas Graf', davem@davemloft.net, Fengguang Wu
  Cc: LKP, linux-kernel@vger.kernel.org,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	netdev@vger.kernel.org
In-Reply-To: <20150112235821.GB16617@casper.infradead.org>

From: Thomas Graf
> Each per bucket lock covers a configurable number of buckets. While
> shrinking, two buckets in the old table contain entries for a single
> bucket in the new table. We need to lock down both while linking.
> Check if they are protected by different locks to avoid a recursive
> lock.

Thought, could the shrunk table use the same locks as the lower half
of the old table?

I also wonder whether shrinking hash tables is ever actually worth
the effort. Most likely they'll need to grow again very quickly.

>  		spin_lock_bh(old_bucket_lock1);
> -		spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> -		spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> +
> +		/* Depending on the lock per buckets mapping, the bucket in
> +		 * the lower and upper region may map to the same lock.
> +		 */
> +		if (old_bucket_lock1 != old_bucket_lock2) {
> +			spin_lock_bh_nested(old_bucket_lock2, RHT_LOCK_NESTED);
> +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED2);
> +		} else {
> +			spin_lock_bh_nested(new_bucket_lock, RHT_LOCK_NESTED);
> +		}

Acquiring 3 locks of much the same type looks like a locking hierarchy
violation just waiting to happen.

	David

^ permalink raw reply

* Re: [PATCH net-next] rhashtable: unnecessary to use delayed work
From: Ying Xue @ 2015-01-13  9:48 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev
In-Reply-To: <20150113093550.GG20387@casper.infradead.org>

On 01/13/2015 05:35 PM, Thomas Graf wrote:
> On 01/13/15 at 05:00pm, Ying Xue wrote:
>> When we put our declared work task in the global workqueue with
>> schedule_delayed_work(), its delay parameter is always zero.
>> Therefore, we should define a normal work in rhashtable structure
>> instead of a delayed work.
>>
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Cc: Thomas Graf <tgraf@suug.ch>
> 
>> @@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht)
>>  
>>  	mutex_lock(&ht->mutex);
>>  
>> -	cancel_delayed_work(&ht->run_work);
>> +	cancel_work_sync(&ht->run_work);
>>  	bucket_table_free(rht_dereference(ht->tbl, ht));
>>  
>>  	mutex_unlock(&ht->mutex);
> 
> I like the patch!
> 
> I think it introduces a possible dead lock though (see below). OTOH, it
> could actually explain the reason for the 0day lock debug splash that
> was reported.
> 
> Dead lock: The worker could already have been kicked off but was
> interrupted before it acquired ht->mutex. rhashtable_destroy() is
> called and acquired ht->mutex. cancel_work_sync() waits for worker to
> finish while holding ht->mutex. Worker can't finish because it needs to
> acquire ht->mutex to do so.
> 
> For the very same reason the reported warning could have been triggered.
> Instead of the dead lock, it would have called bucket_table_free()
> with a deferred resizer still underway.
> 
> What about we do something like this?
> 
> void rhashtable_destroy(struct rhashtable *ht)
> {
>         ht->being_destroyed = true;
> 	cancel_work_sync(&ht->run_work);
> 
> 	mutex_lock(&ht->mutex);
> 	bucket_table_free(rht_dereference(ht->tbl, ht));
> 	mutex_unlock(&ht->mutex);
> }
> 

Damn! I knew your above described deadlock scenario. Thank you for the
nice catch!

> If you agree we can explain this shortly in the commit message and add:
> Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")
> 

OK, I will deliver the next version.

By the way, I think we should check the following condition before call
cancel_work_sync(), otherwise, we may cancel an uninitialized work.

(ht->p.grow_decision || ht->p.shrink_decision)

What do you think?

Regards,
Ying

> 

^ permalink raw reply

* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Hannes Frederic Sowa @ 2015-01-13 10:11 UTC (permalink / raw)
  To: Rahul Sharma; +Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel
In-Reply-To: <CAFB3abxcg4gdEh4CJHd_Vx8mZKFmO3kMG=pDjkQYS5awTzFbSQ@mail.gmail.com>

On Di, 2015-01-13 at 09:53 +0530, Rahul Sharma wrote:
> On Mon, Jan 12, 2015 at 5:21 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Mon, Jan 12, 2015 at 04:38:16PM +0530, Rahul Sharma wrote:
> >> Hi Pablo, Hannes
> >>
> >> On Fri, Jan 9, 2015 at 9:20 PM, Hannes Frederic Sowa
> >> <hannes@stressinduktion.org> wrote:
> >> > On Fr, 2015-01-09 at 12:45 +0100, Pablo Neira Ayuso wrote:
> >> >> Hi Hannes,
> >> >>
> >> >> On Fri, Jan 09, 2015 at 12:34:15PM +0100, Hannes Frederic Sowa wrote:
> >> >> > On Fri, Jan 9, 2015, at 08:18, Rahul Sharma wrote:
> >> >> > > Hi Pablo,
> >> >> > >
> >> >> > > On Fri, Jan 9, 2015 at 5:35 AM, Pablo Neira Ayuso <pablo@netfilter.org>
> >> >> > > wrote:
> >> >> > > > On Thu, Jan 08, 2015 at 11:39:16PM +0100, Hannes Frederic Sowa wrote:
> >> >> > > >> Hi Pablo,
> >> >> > > >>
> >> >> > > >> On Thu, Jan 8, 2015, at 21:53, Pablo Neira Ayuso wrote:
> >> >> > > >> > I'm afraid we cannot just get rid of that !ipv6_ext_hdr() check. The
> >> >> > > >> > ipv6_find_hdr() function is designed to return the transport protocol.
> >> >> > > >> > After the proposed change, it will return extension header numbers.
> >> >> > > >> > This will break existing ip6tables rulesets since the `-p' option
> >> >> > > >> > relies on this function to match the transport protocol.
> >> >> > > >> >
> >> >> > > >> > Note that the AH header is skipped (see code a bit below this
> >> >> > > >> > problematic fragmentation handling) so the follow up header after the
> >> >> > > >> > AH header is returned as the transport header.
> >> >> > > >> >
> >> >> > > >> > We can probably return the AH protocol number for non-1st fragments.
> >> >> > > >> > However, that would be something new to ip6tables since nobody has
> >> >> > > >> > ever seen packet matching `-p ah' rules. Thus, we restore control to
> >> >> > > >> > the user to allow this, but we would accept all kind of fragmented AH
> >> >> > > >> > traffic through the firewall since we cannot know what transport
> >> >> > > >> > protocol contains from non-1st fragments (unless I'm missing anything,
> >> >> > > >> > I need to have a closer look at this again tomorrow with fresher
> >> >> > > >> > mind).
> >> >> > > >>
> >> >> > > >> The code in question is guarded by (_frag_off != 0), so we are
> >> >> > > >> definitely processing a non-1st fragment currently. The -p match would
> >> >> > > >> happen at the time when the packet is reassembled and thus ipv6_find_hdr
> >> >> > > >> will find the real transport (final) header at this point (I hope I
> >> >> > > >> followed the code correctly here).
> >> >> > > >
> >> >> > > > Then, Rahul should get things working by modprobing nf_defrag_ipv6.
> >> >> > >
> >> >> > > I already had nf_defrag_ipv6 installed when the issue occured. But I
> >> >> > > see ip6table_raw_hook returning NF_DROP for the second fragment.
> >> >> >
> >> >> > That's what I expected. I think the change only affects hooks before
> >> >> > reassembly.
> >> >>
> >> >> reassembly happens at NF_IP6_PRI_CONNTRACK_DEFRAG (-400), so that
> >> >> happens before NF_IP6_PRI_RAW (-300) in IPv6 which is where the raw
> >> >> table is placed.
> >> >
> >> > I tried to reproduce it, but couldn't get non-1st fragments getting
> >> > dropped during traversal of the raw table. They get dropped earlier at
> >> > during reassembly or pass.
> >> >
> >> > I agree with Pablo, I also would like to see more data.
> >> >
> >> > Thanks,
> >> > Hannes
> >> >
> >> >
> >>
> >> I enabled pr_debug() and there was no error in nf_ct_frag6_gather().
> >> It seems to have defragmented the packet correctly. As expected,
> >> ipv6_defrag() returns NF_STOLEN for the first packet after queuing it.
> >> For the next fragment, ipv6_defrag() calls nf_ct_frag6_output() after
> >> after reassembling it.
> >
> > nf_ct_frag6_output() doesn't exist anymore. You're using an old
> > kernel, you should have started by telling so in your report.
> >
> > See 6aafeef ("netfilter: push reasm skb through instead of original
> > frag skbs").
> 
>  I apologize for not mentioning the kernel version in my first mail. I
> had suspected problem in ipv6_find_hdr, the code for which was same.
> Anyway, thanks for the help. I ll try to figure out how to make this
> work in my kernel.

If you have time could you quickly test a recent net-next kernel?

Thanks,
Hannes

^ permalink raw reply

* Re: why are IPv6 addresses removed on link down
From: Hannes Frederic Sowa @ 2015-01-13 10:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Ahern, netdev@vger.kernel.org
In-Reply-To: <20150112231021.316648e3@urahara>

On Mo, 2015-01-12 at 23:10 -0800, Stephen Hemminger wrote:
> On Mon, 12 Jan 2015 22:06:44 -0700
> David Ahern <dsahern@gmail.com> wrote:
> 
> > We noticed that IPv6 addresses are removed on a link down. e.g.,
> >    ip link set dev eth1
> > 
> > 
> > Looking at the code it appears to be this code path in addrconf.c:
> > 
> >          case NETDEV_DOWN:
> >          case NETDEV_UNREGISTER:
> >                  /*
> >                   *      Remove all addresses from this interface.
> >                   */
> >                  addrconf_ifdown(dev, event != NETDEV_DOWN);
> >                  break;
> > 
> > IPv4 addresses are NOT removed on a link down. Is there a particular 
> > reason IPv6 addresses are?
> > 
> > Thanks,
> > David
> 
> See RFC's which describes how IPv6 does Duplicate Address Detection.
> Address is not valid when link is down, since DAD is not possible.

It should be no problem if the kernel would reacquire them on ifup and
do proper DAD. We simply must not use them while the interface is dead
(also making sure they don't get used for loopback routing).

The problem the IPv6 addresses get removed is much more a historical
artifact nowadays, I think. It is part of user space API and scripts
deal with that already.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH v5] can: Convert to runtime_pm
From: Marc Kleine-Budde @ 2015-01-13 11:08 UTC (permalink / raw)
  To: Sören Brinkmann, Kedareswara rao Appana
  Cc: wg, michal.simek, grant.likely, robh+dt, linux-can, netdev,
	linux-arm-kernel, linux-kernel, devicetree,
	Kedareswara rao Appana
In-Reply-To: <3a3437c5c8ff48d9a45fee7e81fa8dca@BY2FFO11FD058.protection.gbl>

[-- Attachment #1: Type: text/plain, Size: 3138 bytes --]

On 01/12/2015 07:45 PM, Sören Brinkmann wrote:
> On Mon, 2015-01-12 at 08:34PM +0530, Kedareswara rao Appana wrote:
>> Instead of enabling/disabling clocks at several locations in the driver,
>> Use the runtime_pm framework. This consolidates the actions for runtime PM
>> In the appropriate callbacks and makes the driver more readable and mantainable.
>>
>> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
>> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
>> ---
>> Changes for v5:
>>  - Updated with the review comments.
>>    Updated the remove fuction to use runtime_pm.
>> Chnages for v4:
>>  - Updated with the review comments.
>> Changes for v3:
>>   - Converted the driver to use runtime_pm.
>> Changes for v2:
>>   - Removed the struct platform_device* from suspend/resume
>>     as suggest by Lothar.
>>
>>  drivers/net/can/xilinx_can.c |  157 ++++++++++++++++++++++++++++-------------
>>  1 files changed, 107 insertions(+), 50 deletions(-)
> [..]
>> +static int __maybe_unused xcan_runtime_resume(struct device *dev)
>>  {
>> -	struct platform_device *pdev = dev_get_drvdata(dev);
>> -	struct net_device *ndev = platform_get_drvdata(pdev);
>> +	struct net_device *ndev = dev_get_drvdata(dev);
>>  	struct xcan_priv *priv = netdev_priv(ndev);
>>  	int ret;
>> +	u32 isr, status;
>>  
>>  	ret = clk_enable(priv->bus_clk);
>>  	if (ret) {
>> @@ -1014,15 +1030,28 @@ static int __maybe_unused xcan_resume(struct device *dev)
>>  	ret = clk_enable(priv->can_clk);
>>  	if (ret) {
>>  		dev_err(dev, "Cannot enable clock.\n");
>> -		clk_disable_unprepare(priv->bus_clk);
>> +		clk_disable(priv->bus_clk);
> [...]
>> @@ -1173,12 +1219,23 @@ static int xcan_remove(struct platform_device *pdev)
>>  {
>>  	struct net_device *ndev = platform_get_drvdata(pdev);
>>  	struct xcan_priv *priv = netdev_priv(ndev);
>> +	int ret;
>> +
>> +	ret = pm_runtime_get_sync(&pdev->dev);
>> +	if (ret < 0) {
>> +		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
>> +				__func__, ret);
>> +		return ret;
>> +	}
>>  
>>  	if (set_reset_mode(ndev) < 0)
>>  		netdev_err(ndev, "mode resetting failed!\n");
>>  
>>  	unregister_candev(ndev);
>> +	pm_runtime_disable(&pdev->dev);
>>  	netif_napi_del(&priv->napi);
>> +	clk_disable_unprepare(priv->bus_clk);
>> +	clk_disable_unprepare(priv->can_clk);
> 
> Shouldn't pretty much all these occurrences of clk_disable/enable
> disappear? This should all be handled by the runtime_pm framework now.

We have:
- clk_prepare_enable() in probe
- clk_disable_unprepare() in remove
- clk_enable() in runtime_resume
- clk_disable() in runtime_suspend

Which is, as far as I understand the right way to do it. Maybe
Kedareswara can post the clock debug output again with this patch
iteration. Have I missed something?

regards,
Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox