Netdev List

Netdev List
 help / color / mirror / Atom feed

* [patch -next] 6LoWPAN: double free in lowpan_fragment_xmit()
From: Dan Carpenter @ 2011-11-16  8:21 UTC (permalink / raw)
  To: Dmitry Eremin-Solenikov, Alexander Smirnov
  Cc: Sergey Lapin, David S. Miller, linux-zigbee-devel, netdev,
	kernel-janitors

dev_queue_xmit() consumes its own skb, so the call to dev_kfree_skb()
ieee802154/6lowpan.clowpan_fragment_xmits a double free.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 602f318..e4ecc1e 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -980,9 +980,6 @@ lowpan_fragment_xmit(struct sk_buff *skb, u8 *head,
 
 	ret = dev_queue_xmit(frag);
 
-	if (ret < 0)
-		dev_kfree_skb(frag);
-
 	return ret;
 }
 

^ permalink raw reply related

* Re: Hitting BUG_ON() from napi_enable in e1000e
From: Jeff Kirsher @ 2011-11-16  7:32 UTC (permalink / raw)
  To: Mike McElroy; +Cc: netdev
In-Reply-To: <4EC16DE3.5020701@stratus.com>

On Mon, Nov 14, 2011 at 11:37, Mike McElroy <mike.mcelroy@stratus.com> wrote:
>
> Hitting the BUG_ON in napi_enable(). Code inspection shows that this can
> only be triggered by calling napi_enable() twice without an intervening
> napi_disable().
>
> I saw the following sequence of events in the stack trace:
>
> 1) We simulated a cable pull using an Extreme switch.
> 2) e1000_tx_timeout() was entered.
> 3) e1000_reset_task() was called. Saw the message from e_err() in the
> console log.
> 4) e1000_reinit_locked was called. This function calls e1000_down() and
> e1000_up(). These functions call napi_disable() and napi_enable()
> respectively.
> 5) Then on another thread, a monitor task saw carrier was down and executed
> 'ip set link down' and 'ip set link up' commands.
> 6) Saw the '_E1000_RESETTING'warning fron the e1000_close function.
> 7) Either the e1000_open() executed between the e1000_down() and e1000_up()
> calls in step 4 or the e1000_open() call executed after the e0001_up() call.
> In either case, napi_enable() is called twice which triggers the BUG_ON.
>
> This code sequence is present in the e1000 driver also.
>
> There are two bugs here:
> 1) The napi_enable() and napi_disable() should only be called in the
> e1000_open and e1000_close functions respectively
> 2) There no synchronization preventing a call to the driver close while
> executing error processing.
>
> Here is a patch for the napi_enable BUG_ON:
>
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index 5ec1f99..e1af6fa 100755
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -4242,9 +4242,6 @@ int e1000e_up(struct e1000_adapter *adapter)
>
>        clear_bit(__E1000_DOWN, &adapter->state);
>
> -#ifdef CONFIG_E1000E_NAPI
> -       napi_enable(&adapter->napi);
> -#endif
>  #ifdef CONFIG_E1000E_MSIX
>        if (adapter->msix_entries)
>                e1000_configure_msix(adapter);
> @@ -4307,10 +4304,6 @@ void e1000e_down(struct e1000_adapter *adapter)
>        /* flush both disables and wait for them to finish */
>        e1e_flush();
>        usleep_range(10000, 20000);
> -
> -#ifdef CONFIG_E1000E_NAPI
> -       napi_disable(&adapter->napi);
> -#endif
>        e1000_irq_disable(adapter);
>
>        del_timer_sync(&adapter->watchdog_timer);
> @@ -4677,6 +4670,10 @@ static int e1000_close(struct net_device *netdev)
>
>        pm_runtime_get_sync(&pdev->dev);
>
> +#ifdef CONFIG_E1000E_NAPI
> +       napi_disable(&adapter->napi);
> +#endif
> +
>        if (!test_bit(__E1000_DOWN, &adapter->state)) {
>                e1000e_down(adapter);
>                e1000_free_irq(adapter);
>

Thanks, I will add this patch to my queue.

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Michael S. Tsirkin @ 2011-11-16  7:23 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar, gorcunov, kvm, Asias He, virtualization,
	Pekka Enberg, Sasha Levin, netdev, mingo, Stephen Hemminger
In-Reply-To: <877h31ortx.fsf@rustcorp.com.au>

On Wed, Nov 16, 2011 at 10:34:42AM +1030, Rusty Russell wrote:
> On Mon, 14 Nov 2011 15:05:07 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Nov 14, 2011 at 02:25:17PM +0200, Pekka Enberg wrote:
> > > On Mon, Nov 14, 2011 at 4:04 AM, Asias He <asias.hejun@gmail.com> wrote:
> > > > Why both the bandwidth and latency performance are dropping so dramatically
> > > > with multiple VQ?
> > > 
> > > What's the expected benefit from multiple VQs
> > 
> > Heh, the original patchset didn't mention this :) It really should.
> > They are supposed to speed up networking for high smp guests.
> 
> If we have one queue per guest CPU, does this allow us to run lockless?
> 
> Thanks,
> Rusty.

LLTX? It's supposed to be deprecated, isn't it?

-- 
MST

^ permalink raw reply

* Re: [PATCH V2] vlan:return error when real dev is enslaved
From: Weiping Pan @ 2011-11-16  7:16 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Patrick McHardy (maintainer:VLAN (802.1Q)),
	"David S. Miller" (maintainer:NETWORKING [GENERAL]),
	open list:VLAN (802.1Q), open list
In-Reply-To: <4EC2BB48.2070802@gmail.com>

On 11/16/2011 03:19 AM, Nicolas de Pesloüan wrote:
> Le 15/11/2011 13:44, Weiping Pan a écrit :
>> Qinhuibin reported a kernel panic when he do some operation about vlan.
>> https://lkml.org/lkml/2011/11/6/218
>>
>> The operation is as below:
>> ifconfig eth2 up
>> modprobe bonding
>> modprobe 8021q
>> ifconfig bond0 up
>> ifenslave bond0 eth2
>> vconfig add eth2 3300
>> vconfig add bond0 33
>> vconfig rem eth2.3300
>>
>> the panic stack is as below:
>> [<ffffffffa002f1c9>] panic_event+0x49/0x70 [ipmi_msghandler]
>> [<ffffffff80378917>] notifier_call_chain+0x37/0x70
>> [<ffffffff80372122>] panic+0xa2/0x195
>> [<ffffffff80376ed8>] oops_end+0xd8/0x140
>> [<ffffffff8001bea7>] no_context+0xf7/0x280
>> [<ffffffff8001c1a5>] __bad_area_nosemaphore+0x175/0x250
>> [<ffffffff80376318>] page_fault+0x28/0x30
>> [<ffffffffa039dabd>] igb_vlan_rx_kill_vid+0x4d/0x100 [igb]
>> [<ffffffffa044045f>] bond_vlan_rx_kill_vid+0x9f/0x290 [bonding]
>> [<ffffffffa047e636>] unregister_vlan_dev+0x136/0x180 [8021q]
>> [<ffffffffa047ed20>] vlan_ioctl_handler+0x170/0x3f0 [8021q]
>> [<ffffffff802c1d3f>] sock_ioctl+0x21f/0x280
>> [<ffffffff800e6d7f>] vfs_ioctl+0x2f/0xb0
>> [<ffffffff800e726b>] do_vfs_ioctl+0x3cb/0x5a0
>> [<ffffffff800e74e1>] sys_ioctl+0xa1/0xb0
>> [<ffffffff80007388>] system_call_fastpath+0x16/0x1b
>> [<00007f108a2b8bd7>] 0x7f108a2b8bd7
>> And the nic is as below:
>> [root@localhost ~]# ethtool -i eth2
>> driver: igb
>> version: 3.0.6-k2
>> firmware-version: 1.2-1
>> bus-info: 0000:04:00.0
>> kernel version：
>> 2.6.32.12-0.7 also happen in 2.6.32-131
>>
>> For kernel 2.6.32, the reason of this bug is that when we do "vconfig 
>> add bond0 33",
>> adapter->vlgrp is overwritten in igb_vlan_rx_register. So when we do 
>> "vconfig rem
>> eth2.3300", it can't find the correct vlgrp.
>>
>> And this bug is avoided by vlan cleanup patchset from Jiri Pirko
>> <jpirko@redhat.com>, especially commit b2cb09b1a772(igb: do vlan 
>> cleanup).
>>
>> But it is not a correct operation to creat a vlan interface on eth2
>> when it have been enslaved by bond0, so this patch is to return error
>> when the real dev is already enslaved.
>
> Why isn't this setup correct?
>
> Compare to bridge, where ebtables allow for some sort of sharing of 
> the physical interface between bridge and vlan.
>
> I think bonding should behave the same way instead of denying this setup.
>
>     Nicolas.
>
Hi, Nicolas,

After some investigation I agree with you that this setup is correct,
since we can  "switchport trunk allowed vlan 2-3" on the switch.

I can confirm that both bonding and bridge support this kind of setup.

bond0.2                                        br0.2
       |                                                |
   eth0 ----- eth0.3                          eth0 -----eth0.3

Both works fine(git commit 06236ac3726f1),  so please discard this patch.

thanks
Weiping Pan

^ permalink raw reply

* Re: [PATCH] Phonet: set the pipe handle using setsockopt
From: Rémi Denis-Courmont @ 2011-11-16  6:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <81C3A93C17462B4BBD7E272753C105791FB090B0B8@EXDCVYMBSTM005.EQ1STM.local>

Le Lundi 14 Novembre 2011 11:36:12 ext Hemant-vilas RAMDASI a écrit :
 > > sockaddr_pn *spn) /* Phonet device ioctl requests */
> > > 
> > >  #ifdef __KERNEL__
> > >  #define SIOCPNGAUTOCONF		(SIOCDEVPRIVATE + 0)
> > > 
> > > +#define SIOPNPIPE_ENABLE	_IO(SIOCPNGAUTOCONF,   1)
> > 
> > Does this even work? I am not an expert on this, but I would think that
> > device-private controls are routed to the network device, not the
> > socket. In
> > any case, it does not seem right.
> 
> Yes, it works. The ioctl is routed to per-socket functions.

Even if it works, sockets are probably not supposed to use the device-private 
ioctl() range, are they?

And why is this inside __KERNEL__ ?

> > > @@ -994,6 +1068,17 @@ static int pep_getsockopt(struct sock *sk, int
> > 
> > level,
> > 
> > > int optname, return -EINVAL;
> > > 
> > >  		break;
> > > 
> > > +	case PNPIPE_ENABLE:
> > > +		if (sk->sk_state == TCP_ESTABLISHED)
> > > +			val = 1;
> > > +		else
> > > +			val = 0;
> > > +		break;
> > 
> > Do you still need this read-only option?
> 
> Yes.

Why and how?

-- 
Rémi Denis-Courmont
http://www.remlab.net/

^ permalink raw reply

* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: jason wang @ 2011-11-16  6:10 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: penberg, kvm, Michael S. Tsirkin, Asias He, virtualization,
	gorcunov, Sasha Levin, netdev, mingo
In-Reply-To: <OFDA747DDD.8D1C8FD8-ON65257949.001837DA-65257949.0019D25F@in.ibm.com>

On 11/15/2011 12:44 PM, Krishna Kumar2 wrote:
> Sasha Levin <levinsasha928@gmail.com> wrote on 11/14/2011 03:45:40 PM:
>
>>> Why both the bandwidth and latency performance are dropping so
>>> dramatically with multiple VQ?
>> It looks like theres no hash sync between host and guest, which makes
>> the RX VQ change for every packet. This is my guess.
> Yes, I confirmed this happens for macvtap. I am
> using ixgbe - it calls skb_record_rx_queue when
> a skb is allocated, but sets rxhash when a packet
> arrives. Macvtap is relying on record_rx_queue
> first ahead of rxhash (as part of my patch making
> macvtap multiqueue), hence different skbs result
> in macvtap selecting different vq's.
>
> Reordering macvtap to use rxhash first results in
> all packets going to the same VQ. The code snippet
> is:
>
> {
> 	...
> 	if (!numvtaps)
>                 goto out;
>
> 	rxq = skb_get_rxhash(skb);
> 	if (rxq) {
> 		tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
> 		if (tap)
> 			goto out;
> 	}
>
> 	if (likely(skb_rx_queue_recorded(skb))) {
> 		rxq = skb_get_rx_queue(skb);
>
> 		while (unlikely(rxq >= numvtaps))
> 			rxq -= numvtaps;
> 			tap = rcu_dereference(vlan->taps[rxq]);
> 			if (tap)
> 				goto out;
> 	}
> }
>
> I will submit a patch for macvtap separately. I am working
> towards the other issue pointed out - different vhost
> threads handling rx/tx of a single flow.
Hello Krishna:

Have any thought in mind to solve the issue of flow handling?

Maybe some performance numbers first is better, it would let us know
where we are. During the test of my patchset, I find big regression of
small packet transmission, and more retransmissions were noticed. This
maybe also the issue of flow affinity. One interesting things is to see
whether this happens in your patches :)

I've played with a basic flow director implementation based on my series
which want to make sure the packets of a flow was handled by the same
vhost thread/guest vcpu. This is done by:

- bind virtqueue to guest cpu
- record the hash to queue mapping when guest sending packets and use
this mapping to choose the virtqueue when forwarding packets to guest

Test shows some help during for receiving packets from external host and
packet sending to local host. But it would hurt the performance of
sending packets to remote host. This is not the perfect solution as it
can not handle guest moving processes among vcpus, I plan to try
accelerate RFS and sharing the mapping between host and guest.

Anyway this is just for receiving, the small packet sending need more
thoughts.

Thanks

>
> thanks,
>
> - KK
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v5] Phonet: set the pipe handle using setsockopt
From: Hemant Vilas RAMDASI @ 2011-11-16  5:44 UTC (permalink / raw)
  To: netdev-owner
  Cc: netdev, remi.denis-courmont, Dinesh Kumar Sharma, Hemant Ramdasi

From: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>

This provides flexibility to set the pipe handle
using setsockopt. The pipe can be enabled (if disabled) later
using ioctl.

Signed-off-by: Hemant Ramdasi <hemant.ramdasi@stericsson.com>
Signed-off-by: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>
---
 include/linux/phonet.h |    3 +
 net/phonet/pep.c       |  103 ++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 99 insertions(+), 7 deletions(-)

diff --git a/include/linux/phonet.h b/include/linux/phonet.h
index 6fb1384..4c00551 100644
--- a/include/linux/phonet.h
+++ b/include/linux/phonet.h
@@ -37,6 +37,8 @@
 #define PNPIPE_ENCAP		1
 #define PNPIPE_IFINDEX		2
 #define PNPIPE_HANDLE		3
+#define PNPIPE_ENABLE		4
+#define PNPIPE_INITSTATE	5
 
 #define PNADDR_ANY		0
 #define PNADDR_BROADCAST	0xFC
@@ -180,6 +182,7 @@ static inline __u8 pn_sockaddr_get_resource(const struct sockaddr_pn *spn)
 /* Phonet device ioctl requests */
 #ifdef __KERNEL__
 #define SIOCPNGAUTOCONF		(SIOCDEVPRIVATE + 0)
+#define SIOPNPIPE_ENABLE	_IO(SIOCPNGAUTOCONF,   1)
 
 struct if_phonet_autoconf {
 	uint8_t device;
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index f17fd84..6019b7e 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -533,6 +533,29 @@ static int pep_connresp_rcv(struct sock *sk, struct sk_buff *skb)
 	return pipe_handler_send_created_ind(sk);
 }
 
+static int pep_enableresp_rcv(struct sock *sk, struct sk_buff *skb)
+{
+	struct pnpipehdr *hdr = pnp_hdr(skb);
+
+	if (hdr->error_code != PN_PIPE_NO_ERROR)
+		return -ECONNREFUSED;
+
+	return pep_indicate(sk, PNS_PIPE_ENABLED_IND, 0 /* sub-blocks */,
+		NULL, 0, GFP_ATOMIC);
+
+}
+
+static void pipe_start_flow_control(struct sock *sk)
+{
+	struct pep_sock *pn = pep_sk(sk);
+
+	if (!pn_flow_safe(pn->tx_fc)) {
+		atomic_set(&pn->tx_credits, 1);
+		sk->sk_write_space(sk);
+	}
+	pipe_grant_credits(sk, GFP_ATOMIC);
+}
+
 /* Queue an skb to an actively connected sock.
  * Socket lock must be held. */
 static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
@@ -578,13 +601,25 @@ static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
 			sk->sk_state = TCP_CLOSE_WAIT;
 			break;
 		}
+		if (pn->init_enable == PN_PIPE_DISABLE)
+			sk->sk_state = TCP_SYN_RECV;
+		else {
+			sk->sk_state = TCP_ESTABLISHED;
+			pipe_start_flow_control(sk);
+		}
+		break;
 
-		sk->sk_state = TCP_ESTABLISHED;
-		if (!pn_flow_safe(pn->tx_fc)) {
-			atomic_set(&pn->tx_credits, 1);
-			sk->sk_write_space(sk);
+	case PNS_PEP_ENABLE_RESP:
+		if (sk->sk_state != TCP_SYN_SENT)
+			break;
+
+		if (pep_enableresp_rcv(sk, skb)) {
+			sk->sk_state = TCP_CLOSE_WAIT;
+			break;
 		}
-		pipe_grant_credits(sk, GFP_ATOMIC);
+
+		sk->sk_state = TCP_ESTABLISHED;
+		pipe_start_flow_control(sk);
 		break;
 
 	case PNS_PEP_DISCONNECT_RESP:
@@ -863,14 +898,32 @@ static int pep_sock_connect(struct sock *sk, struct sockaddr *addr, int len)
 	int err;
 	u8 data[4] = { 0 /* sub-blocks */, PAD, PAD, PAD };
 
-	pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+	if (pn->pipe_handle == PN_PIPE_INVALID_HANDLE)
+		pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+
 	err = pipe_handler_request(sk, PNS_PEP_CONNECT_REQ,
-					PN_PIPE_ENABLE, data, 4);
+				pn->init_enable, data, 4);
 	if (err) {
 		pn->pipe_handle = PN_PIPE_INVALID_HANDLE;
 		return err;
 	}
+
+	sk->sk_state = TCP_SYN_SENT;
+
+	return 0;
+}
+
+static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
+{
+	int err;
+
+	err = pipe_handler_request(sk, PNS_PEP_ENABLE_REQ, PAD,
+				NULL, 0);
+	if (err)
+		return err;
+
 	sk->sk_state = TCP_SYN_SENT;
+
 	return 0;
 }
 
@@ -894,6 +947,19 @@ static int pep_ioctl(struct sock *sk, int cmd, unsigned long arg)
 			answ = 0;
 		release_sock(sk);
 		return put_user(answ, (int __user *)arg);
+		break;
+
+	case SIOPNPIPE_ENABLE:
+		lock_sock(sk);
+		if (sk->sk_state == TCP_SYN_SENT)
+			answ =  -EBUSY;
+		else if (sk->sk_state == TCP_ESTABLISHED)
+			answ = -EISCONN;
+		else
+			answ = pep_sock_enable(sk, NULL, 0);
+		release_sock(sk);
+		return answ;
+		break;
 	}
 
 	return -ENOIOCTLCMD;
@@ -959,6 +1025,18 @@ static int pep_setsockopt(struct sock *sk, int level, int optname,
 		}
 		goto out_norel;
 
+	case PNPIPE_HANDLE:
+		if ((sk->sk_state == TCP_CLOSE) &&
+			(val >= 0) && (val < PN_PIPE_INVALID_HANDLE))
+			pn->pipe_handle = val;
+		else
+			err = -EINVAL;
+		break;
+
+	case PNPIPE_INITSTATE:
+		pn->init_enable = !!val;
+		break;
+
 	default:
 		err = -ENOPROTOOPT;
 	}
@@ -994,6 +1072,17 @@ static int pep_getsockopt(struct sock *sk, int level, int optname,
 			return -EINVAL;
 		break;
 
+	case PNPIPE_ENABLE:
+		if (sk->sk_state == TCP_ESTABLISHED)
+			val = 1;
+		else
+			val = 0;
+		break;
+
+	case PNPIPE_INITSTATE:
+		val = pn->init_enable;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
1.7.4.3

^ permalink raw reply related

* [PATCH net-next v5 06/10] forcedeth: allow to silence "TX timeout" debug messages
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, Sameer Nanda, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

From: Sameer Nanda <snanda@google.com>

This adds a new module parameter "debug_tx_timeout" to silence most
debug messages in case of TX timeout. These messages don't provide a
signal/noise ratio high enough for production systems and, with ~30kB
logged each time, they tend to add to a cascade effect if the system
is already under stress (memory pressure, disk, etc.).

By default, the parameter is clear, meaning that only a single warning
will be reported.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |   98 ++++++++++++++++++-------------
 1 files changed, 57 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index fe17e42..9b917ff 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -892,6 +892,11 @@ enum {
 static int dma_64bit = NV_DMA_64BIT_ENABLED;
 
 /*
+ * Debug output control for tx_timeout
+ */
+static bool debug_tx_timeout = false;
+
+/*
  * Crossover Detection
  * Realtek 8201 phy + some OEM boards do not work properly.
  */
@@ -2477,56 +2482,64 @@ static void nv_tx_timeout(struct net_device *dev)
 	u32 status;
 	union ring_type put_tx;
 	int saved_tx_limit;
-	int i;
 
 	if (np->msi_flags & NV_MSI_X_ENABLED)
 		status = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQSTAT_MASK;
 	else
 		status = readl(base + NvRegIrqStatus) & NVREG_IRQSTAT_MASK;
 
-	netdev_info(dev, "Got tx_timeout. irq: %08x\n", status);
+	netdev_warn(dev, "Got tx_timeout. irq status: %08x\n", status);
 
-	netdev_info(dev, "Ring at %lx\n", (unsigned long)np->ring_addr);
-	netdev_info(dev, "Dumping tx registers\n");
-	for (i = 0; i <= np->register_size; i += 32) {
-		netdev_info(dev,
-			    "%3x: %08x %08x %08x %08x %08x %08x %08x %08x\n",
-			    i,
-			    readl(base + i + 0), readl(base + i + 4),
-			    readl(base + i + 8), readl(base + i + 12),
-			    readl(base + i + 16), readl(base + i + 20),
-			    readl(base + i + 24), readl(base + i + 28));
-	}
-	netdev_info(dev, "Dumping tx ring\n");
-	for (i = 0; i < np->tx_ring_size; i += 4) {
-		if (!nv_optimized(np)) {
-			netdev_info(dev,
-				    "%03x: %08x %08x // %08x %08x // %08x %08x // %08x %08x\n",
-				    i,
-				    le32_to_cpu(np->tx_ring.orig[i].buf),
-				    le32_to_cpu(np->tx_ring.orig[i].flaglen),
-				    le32_to_cpu(np->tx_ring.orig[i+1].buf),
-				    le32_to_cpu(np->tx_ring.orig[i+1].flaglen),
-				    le32_to_cpu(np->tx_ring.orig[i+2].buf),
-				    le32_to_cpu(np->tx_ring.orig[i+2].flaglen),
-				    le32_to_cpu(np->tx_ring.orig[i+3].buf),
-				    le32_to_cpu(np->tx_ring.orig[i+3].flaglen));
-		} else {
+	if (unlikely(debug_tx_timeout)) {
+		int i;
+
+		netdev_info(dev, "Ring at %lx\n", (unsigned long)np->ring_addr);
+		netdev_info(dev, "Dumping tx registers\n");
+		for (i = 0; i <= np->register_size; i += 32) {
 			netdev_info(dev,
-				    "%03x: %08x %08x %08x // %08x %08x %08x // %08x %08x %08x // %08x %08x %08x\n",
+				    "%3x: %08x %08x %08x %08x "
+				    "%08x %08x %08x %08x\n",
 				    i,
-				    le32_to_cpu(np->tx_ring.ex[i].bufhigh),
-				    le32_to_cpu(np->tx_ring.ex[i].buflow),
-				    le32_to_cpu(np->tx_ring.ex[i].flaglen),
-				    le32_to_cpu(np->tx_ring.ex[i+1].bufhigh),
-				    le32_to_cpu(np->tx_ring.ex[i+1].buflow),
-				    le32_to_cpu(np->tx_ring.ex[i+1].flaglen),
-				    le32_to_cpu(np->tx_ring.ex[i+2].bufhigh),
-				    le32_to_cpu(np->tx_ring.ex[i+2].buflow),
-				    le32_to_cpu(np->tx_ring.ex[i+2].flaglen),
-				    le32_to_cpu(np->tx_ring.ex[i+3].bufhigh),
-				    le32_to_cpu(np->tx_ring.ex[i+3].buflow),
-				    le32_to_cpu(np->tx_ring.ex[i+3].flaglen));
+				    readl(base + i + 0), readl(base + i + 4),
+				    readl(base + i + 8), readl(base + i + 12),
+				    readl(base + i + 16), readl(base + i + 20),
+				    readl(base + i + 24), readl(base + i + 28));
+		}
+		netdev_info(dev, "Dumping tx ring\n");
+		for (i = 0; i < np->tx_ring_size; i += 4) {
+			if (!nv_optimized(np)) {
+				netdev_info(dev,
+					    "%03x: %08x %08x // %08x %08x "
+					    "// %08x %08x // %08x %08x\n",
+					    i,
+					    le32_to_cpu(np->tx_ring.orig[i].buf),
+					    le32_to_cpu(np->tx_ring.orig[i].flaglen),
+					    le32_to_cpu(np->tx_ring.orig[i+1].buf),
+					    le32_to_cpu(np->tx_ring.orig[i+1].flaglen),
+					    le32_to_cpu(np->tx_ring.orig[i+2].buf),
+					    le32_to_cpu(np->tx_ring.orig[i+2].flaglen),
+					    le32_to_cpu(np->tx_ring.orig[i+3].buf),
+					    le32_to_cpu(np->tx_ring.orig[i+3].flaglen));
+			} else {
+				netdev_info(dev,
+					    "%03x: %08x %08x %08x "
+					    "// %08x %08x %08x "
+					    "// %08x %08x %08x "
+					    "// %08x %08x %08x\n",
+					    i,
+					    le32_to_cpu(np->tx_ring.ex[i].bufhigh),
+					    le32_to_cpu(np->tx_ring.ex[i].buflow),
+					    le32_to_cpu(np->tx_ring.ex[i].flaglen),
+					    le32_to_cpu(np->tx_ring.ex[i+1].bufhigh),
+					    le32_to_cpu(np->tx_ring.ex[i+1].buflow),
+					    le32_to_cpu(np->tx_ring.ex[i+1].flaglen),
+					    le32_to_cpu(np->tx_ring.ex[i+2].bufhigh),
+					    le32_to_cpu(np->tx_ring.ex[i+2].buflow),
+					    le32_to_cpu(np->tx_ring.ex[i+2].flaglen),
+					    le32_to_cpu(np->tx_ring.ex[i+3].bufhigh),
+					    le32_to_cpu(np->tx_ring.ex[i+3].buflow),
+					    le32_to_cpu(np->tx_ring.ex[i+3].flaglen));
+			}
 		}
 	}
 
@@ -6156,6 +6169,9 @@ module_param(phy_cross, int, 0);
 MODULE_PARM_DESC(phy_cross, "Phy crossover detection for Realtek 8201 phy is enabled by setting to 1 and disabled by setting to 0.");
 module_param(phy_power_down, int, 0);
 MODULE_PARM_DESC(phy_power_down, "Power down phy and disable link when interface is down (1), or leave phy powered up (0).");
+module_param(debug_tx_timeout, bool, 0);
+MODULE_PARM_DESC(debug_tx_timeout,
+		 "Dump tx related registers and ring when tx_timeout happens");
 
 MODULE_AUTHOR("Manfred Spraul <manfred@colorfullife.com>");
 MODULE_DESCRIPTION("Reverse Engineered nForce ethernet driver");
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 07/10] forcedeth: implement ndo_get_stats64() API
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

This commit implements the ndo_get_stats64() API for forcedeth. Since
these stats are being updated from different contexts (process and
timer), this commit adds protection (locking + atomic variables).

Tested:
  - 16-way SMP x86_64 ->
    RX bytes:7244556582 (7.2 GB)  TX bytes:181904254 (181.9 MB)
  - pktgen + loopback: identical rx_bytes/tx_bytes and rx_packets/tx_packets



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |  182 ++++++++++++++++++++++++-------
 1 files changed, 141 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 9b917ff..6aeb0d6 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -692,6 +692,21 @@ struct nv_ethtool_stats {
 #define NV_DEV_STATISTICS_V2_COUNT (NV_DEV_STATISTICS_V3_COUNT - 3)
 #define NV_DEV_STATISTICS_V1_COUNT (NV_DEV_STATISTICS_V2_COUNT - 6)
 
+/* driver statistics */
+struct nv_driver_stat {
+	atomic_t delta;  /* increase since last nv_update_stats() */
+	u64 total;  /* cumulative, requires netdev_priv(dev)->stats_lock */
+};
+
+#define NV_DRIVER_STAT_ATOMIC_INC(ptr_stat) /* atomic */ \
+	({ atomic_inc(&(ptr_stat)->delta); })
+#define NV_DRIVER_STAT_ATOMIC_ADD(ptr_stat,increment) /* atomic */	\
+	({ atomic_add((increment), &(ptr_stat)->delta); })
+#define NV_DRIVER_STAT_UPDATE_TOTAL(ptr_stat) /* requires stats_lock */ \
+	({ (ptr_stat)->total += atomic_xchg(&(ptr_stat)->delta, 0); })
+#define NV_DRIVER_STAT_GET_TOTAL(ptr_stat) /* requires stats_lock */ \
+	((ptr_stat)->total)
+
 /* diagnostics */
 #define NV_TEST_COUNT_BASE 3
 #define NV_TEST_COUNT_EXTENDED 4
@@ -736,6 +751,12 @@ struct nv_skb_map {
  * - tx setup is lockless: it relies on netif_tx_lock. Actual submission
  *	needs netdev_priv(dev)->lock :-(
  * - set_multicast_list: preparation lockless, relies on netif_tx_lock.
+ *
+ * Stats are protected with stats_lock:
+ * - updated by nv_do_stats_poll (timer). This is meant to avoid
+ *   integer wraparound in the NIC stats registers, at low frequency
+ *   (0.1 Hz)
+ * - updated by nv_get_ethtool_stats + nv_get_stats64
  */
 
 /* in dev: base, irq */
@@ -745,9 +766,10 @@ struct fe_priv {
 	struct net_device *dev;
 	struct napi_struct napi;
 
-	/* General data:
-	 * Locking: spin_lock(&np->lock); */
+	/* stats are updated in syscall and timer */
+	spinlock_t stats_lock;
 	struct nv_ethtool_stats estats;
+
 	int in_shutdown;
 	u32 linkspeed;
 	int duplex;
@@ -798,6 +820,11 @@ struct fe_priv {
 	u32 nic_poll_irq;
 	int rx_ring_size;
 
+	/* RX software stats */
+	struct nv_driver_stat stat_rx_packets;
+	struct nv_driver_stat stat_rx_bytes; /* not always available in HW */
+	struct nv_driver_stat stat_rx_missed_errors;
+
 	/* media detection workaround.
 	 * Locking: Within irq hander or disable_irq+spin_lock(&np->lock);
 	 */
@@ -820,6 +847,11 @@ struct fe_priv {
 	struct nv_skb_map *tx_end_flip;
 	int tx_stop;
 
+	/* TX software stats */
+	struct nv_driver_stat stat_tx_packets; /* not always available in HW */
+	struct nv_driver_stat stat_tx_bytes;
+	struct nv_driver_stat stat_tx_dropped;
+
 	/* msi/msi-x fields */
 	u32 msi_flags;
 	struct msix_entry msi_x_entry[NV_MSI_X_MAX_VECTORS];
@@ -1635,11 +1667,19 @@ static void nv_mac_reset(struct net_device *dev)
 	pci_push(base);
 }
 
-static void nv_get_hw_stats(struct net_device *dev)
+/* Caller must appropriately lock netdev_priv(dev)->stats_lock */
+static void nv_update_stats(struct net_device *dev)
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
 
+	/* If it happens that this is run in top-half context, then
+	 * replace the spin_lock of stats_lock with
+	 * spin_lock_irqsave() in calling functions. */
+	WARN_ONCE(in_irq(), "forcedeth: estats spin_lock(_bh) from top-half");
+	assert_spin_locked(&np->stats_lock);
+
+	/* query hardware */
 	np->estats.tx_bytes += readl(base + NvRegTxCnt);
 	np->estats.tx_zero_rexmt += readl(base + NvRegTxZeroReXmt);
 	np->estats.tx_one_rexmt += readl(base + NvRegTxOneReXmt);
@@ -1695,21 +1735,35 @@ static void nv_get_hw_stats(struct net_device *dev)
 		np->estats.tx_multicast += readl(base + NvRegTxMulticast);
 		np->estats.tx_broadcast += readl(base + NvRegTxBroadcast);
 	}
+
+	/* update software stats */
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_packets);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_bytes);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_missed_errors);
+
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_packets);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_bytes);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_dropped);
 }
 
 /*
- * nv_get_stats: dev->get_stats function
+ * nv_get_stats64: dev->ndo_get_stats64 function
  * Get latest stats value from the nic.
  * Called with read_lock(&dev_base_lock) held for read -
  * only synchronized against unregister_netdevice.
  */
-static struct net_device_stats *nv_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64*
+nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
+	__acquires(&netdev_priv(dev)->stats_lock)
+	__releases(&netdev_priv(dev)->stats_lock)
 {
 	struct fe_priv *np = netdev_priv(dev);
 
 	/* If the nic supports hw counters then retrieve latest values */
-	if (np->driver_data & (DEV_HAS_STATISTICS_V1|DEV_HAS_STATISTICS_V2|DEV_HAS_STATISTICS_V3)) {
-		nv_get_hw_stats(dev);
+	if (np->driver_data & DEV_HAS_STATISTICS_V123) {
+		spin_lock_bh(&np->stats_lock);
+
+		nv_update_stats(dev);
 
 		/*
 		 * Note: because HW stats are not always available and
@@ -1721,17 +1775,40 @@ static struct net_device_stats *nv_get_stats(struct net_device *dev)
 		 * packet (Ethernet FCS CRC).
 		 */
 
-		/* copy to net_device stats */
-		dev->stats.tx_fifo_errors = np->estats.tx_fifo_errors;
-		dev->stats.tx_carrier_errors = np->estats.tx_carrier_errors;
-		dev->stats.rx_crc_errors = np->estats.rx_crc_errors;
-		dev->stats.rx_over_errors = np->estats.rx_over_errors;
-		dev->stats.rx_fifo_errors = np->estats.rx_drop_frame;
-		dev->stats.rx_errors = np->estats.rx_errors_total;
-		dev->stats.tx_errors = np->estats.tx_errors_total;
-	}
-
-	return &dev->stats;
+		/* generic stats */
+		storage->rx_packets = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_packets);
+		storage->tx_packets = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_tx_packets);
+		storage->rx_bytes   = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_bytes);
+		storage->tx_bytes   = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_tx_bytes);
+		storage->rx_errors  = np->estats.rx_errors_total;
+		storage->tx_errors  = np->estats.tx_errors_total;
+		storage->tx_dropped = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_tx_dropped);
+
+		/* meaningful only when NIC supports stats v3 */
+		storage->multicast  = np->estats.rx_multicast;
+
+		/* detailed rx_errors */
+		storage->rx_length_errors = np->estats.rx_length_error;
+		storage->rx_over_errors   = np->estats.rx_over_errors;
+		storage->rx_crc_errors    = np->estats.rx_crc_errors;
+		storage->rx_frame_errors  = np->estats.rx_frame_align_error;
+		storage->rx_fifo_errors   = np->estats.rx_drop_frame;
+		storage->rx_missed_errors = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_missed_errors);
+
+		/* detailed tx_errors */
+		storage->tx_carrier_errors = np->estats.tx_carrier_errors;
+		storage->tx_fifo_errors    = np->estats.tx_fifo_errors;
+
+		spin_unlock_bh(&np->stats_lock);
+	}
+
+	return storage;
 }
 
 /*
@@ -1933,7 +2010,7 @@ static void nv_drain_tx(struct net_device *dev)
 			np->tx_ring.ex[i].buflow = 0;
 		}
 		if (nv_release_txskb(np, &np->tx_skb[i]))
-			dev->stats.tx_dropped++;
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_tx_dropped);
 		np->tx_skb[i].dma = 0;
 		np->tx_skb[i].dma_len = 0;
 		np->tx_skb[i].dma_single = 0;
@@ -2390,11 +2467,15 @@ static int nv_tx_done(struct net_device *dev, int limit)
 		if (np->desc_ver == DESC_VER_1) {
 			if (flags & NV_TX_LASTPACKET) {
 				if (flags & NV_TX_ERROR) {
-					if ((flags & NV_TX_RETRYERROR) && !(flags & NV_TX_RETRYCOUNT_MASK))
+					if ((flags & NV_TX_RETRYERROR)
+					    && !(flags & NV_TX_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
 				} else {
-					dev->stats.tx_packets++;
-					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
+					NV_DRIVER_STAT_ATOMIC_INC(
+						&np->stat_tx_packets);
+					NV_DRIVER_STAT_ATOMIC_ADD(
+						&np->stat_tx_bytes,
+						np->get_tx_ctx->skb->len);
 				}
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
@@ -2403,11 +2484,15 @@ static int nv_tx_done(struct net_device *dev, int limit)
 		} else {
 			if (flags & NV_TX2_LASTPACKET) {
 				if (flags & NV_TX2_ERROR) {
-					if ((flags & NV_TX2_RETRYERROR) && !(flags & NV_TX2_RETRYCOUNT_MASK))
+					if ((flags & NV_TX2_RETRYERROR)
+					    && !(flags & NV_TX2_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
 				} else {
-					dev->stats.tx_packets++;
-					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
+					NV_DRIVER_STAT_ATOMIC_INC(
+						&np->stat_tx_packets);
+					NV_DRIVER_STAT_ATOMIC_ADD(
+						&np->stat_tx_bytes,
+						np->get_tx_ctx->skb->len);
 				}
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
@@ -2441,15 +2526,18 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit)
 
 		if (flags & NV_TX2_LASTPACKET) {
 			if (flags & NV_TX2_ERROR) {
-				if ((flags & NV_TX2_RETRYERROR) && !(flags & NV_TX2_RETRYCOUNT_MASK)) {
+				if ((flags & NV_TX2_RETRYERROR)
+				    && !(flags & NV_TX2_RETRYCOUNT_MASK)) {
 					if (np->driver_data & DEV_HAS_GEAR_MODE)
 						nv_gear_backoff_reseed(dev);
 					else
 						nv_legacybackoff_reseed(dev);
 				}
 			} else {
-				dev->stats.tx_packets++;
-				dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
+				NV_DRIVER_STAT_ATOMIC_INC(&np->stat_tx_packets);
+				NV_DRIVER_STAT_ATOMIC_ADD(
+					&np->stat_tx_bytes,
+					np->get_tx_ctx->skb->len);
 			}
 
 			dev_kfree_skb_any(np->get_tx_ctx->skb);
@@ -2663,7 +2751,7 @@ static int nv_rx_process(struct net_device *dev, int limit)
 					/* the rest are hard errors */
 					else {
 						if (flags & NV_RX_MISSEDFRAME)
-							dev->stats.rx_missed_errors++;
+							NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_missed_errors);
 						dev_kfree_skb(skb);
 						goto next_pkt;
 					}
@@ -2706,8 +2794,8 @@ static int nv_rx_process(struct net_device *dev, int limit)
 		skb_put(skb, len);
 		skb->protocol = eth_type_trans(skb, dev);
 		napi_gro_receive(&np->napi, skb);
-		dev->stats.rx_packets++;
-		dev->stats.rx_bytes += len;
+		NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_packets);
+		NV_DRIVER_STAT_ATOMIC_ADD(&np->stat_rx_bytes, len);
 next_pkt:
 		if (unlikely(np->get_rx.orig++ == np->last_rx.orig))
 			np->get_rx.orig = np->first_rx.orig;
@@ -2790,8 +2878,8 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 				__vlan_hwaccel_put_tag(skb, vid);
 			}
 			napi_gro_receive(&np->napi, skb);
-			dev->stats.rx_packets++;
-			dev->stats.rx_bytes += len;
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_packets);
+			NV_DRIVER_STAT_ATOMIC_ADD(&np->stat_rx_bytes, len);
 		} else {
 			dev_kfree_skb(skb);
 		}
@@ -4000,11 +4088,18 @@ static void nv_poll_controller(struct net_device *dev)
 #endif
 
 static void nv_do_stats_poll(unsigned long data)
+	__acquires(&netdev_priv(dev)->stats_lock)
+	__releases(&netdev_priv(dev)->stats_lock)
 {
 	struct net_device *dev = (struct net_device *) data;
 	struct fe_priv *np = netdev_priv(dev);
 
-	nv_get_hw_stats(dev);
+	/* If lock is currently taken, the stats are being refreshed
+	 * and hence fresh enough */
+	if (spin_trylock(&np->stats_lock)) {
+		nv_update_stats(dev);
+		spin_unlock(&np->stats_lock);
+	}
 
 	if (!np->in_shutdown)
 		mod_timer(&np->stats_poll,
@@ -4711,14 +4806,18 @@ static int nv_get_sset_count(struct net_device *dev, int sset)
 	}
 }
 
-static void nv_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *estats, u64 *buffer)
+static void nv_get_ethtool_stats(struct net_device *dev,
+				 struct ethtool_stats *estats, u64 *buffer)
+	__acquires(&netdev_priv(dev)->stats_lock)
+	__releases(&netdev_priv(dev)->stats_lock)
 {
 	struct fe_priv *np = netdev_priv(dev);
 
-	/* update stats */
-	nv_get_hw_stats(dev);
-
-	memcpy(buffer, &np->estats, nv_get_sset_count(dev, ETH_SS_STATS)*sizeof(u64));
+	spin_lock_bh(&np->stats_lock);
+	nv_update_stats(dev);
+	memcpy(buffer, &np->estats,
+	       nv_get_sset_count(dev, ETH_SS_STATS)*sizeof(u64));
+	spin_unlock_bh(&np->stats_lock);
 }
 
 static int nv_link_test(struct net_device *dev)
@@ -5362,7 +5461,7 @@ static int nv_close(struct net_device *dev)
 static const struct net_device_ops nv_netdev_ops = {
 	.ndo_open		= nv_open,
 	.ndo_stop		= nv_close,
-	.ndo_get_stats		= nv_get_stats,
+	.ndo_get_stats64	= nv_get_stats64,
 	.ndo_start_xmit		= nv_start_xmit,
 	.ndo_tx_timeout		= nv_tx_timeout,
 	.ndo_change_mtu		= nv_change_mtu,
@@ -5379,7 +5478,7 @@ static const struct net_device_ops nv_netdev_ops = {
 static const struct net_device_ops nv_netdev_ops_optimized = {
 	.ndo_open		= nv_open,
 	.ndo_stop		= nv_close,
-	.ndo_get_stats		= nv_get_stats,
+	.ndo_get_stats64	= nv_get_stats64,
 	.ndo_start_xmit		= nv_start_xmit_optimized,
 	.ndo_tx_timeout		= nv_tx_timeout,
 	.ndo_change_mtu		= nv_change_mtu,
@@ -5418,6 +5517,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	np->dev = dev;
 	np->pci_dev = pci_dev;
 	spin_lock_init(&np->lock);
+	spin_lock_init(&np->stats_lock);
 	SET_NETDEV_DEV(dev, &pci_dev->dev);
 
 	init_timer(&np->oom_kick);
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 03/10] kbuild: document RPS/XPS network Kconfig options
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

This adds a description of RPS/XPS options and allow them to be
changed at make menuconfig time.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 net/Kconfig |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index a073148..1422c34 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -217,20 +217,30 @@ source "net/dns_resolver/Kconfig"
 source "net/batman-adv/Kconfig"
 
 config RPS
-	boolean
+	boolean "Enable Receive Packet Steering"
 	depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
 	default y
+	  RPS distributes the load of received packet processing
+	  across multiple CPUs. If unsure, say Y.
 
 config RFS_ACCEL
-	boolean
+	boolean "Enable Hardware Acceleration of RFS"
 	depends on RPS && GENERIC_HARDIRQS
 	select CPU_RMAP
 	default y
+	  This is the hardware version of RPS. On multi-queue network
+	  devices, this configures the hardware to distribute the
+	  received packets across multiple CPUs. If unsure, say Y.
 
 config XPS
-	boolean
+	boolean "Enable Transmit Packet Steering"
 	depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
 	default y
+	  For multiqueue devices, XPS selects a transmit queue during
+	  packet transmission based on configuration. This is done by
+	  mapping the CPU transmitting the packet to a queue. XPS can
+	  reduce transmit network latency on SMP systems. If unsure,
+	  say Y.
 
 config HAVE_BPF_JIT
 	bool
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 02/10] net-sysfs: fixed minor sparse warning
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

This commit fixes following warning:
net/core/net-sysfs.c:921:6: warning: symbol 'numa_node' shadows an earlier one
include/linux/topology.h:222:1: originally declared here



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 net/core/net-sysfs.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index c71c434..a64382f 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -901,7 +901,7 @@ static ssize_t store_xps_map(struct netdev_queue *queue,
 	struct xps_map *map, *new_map;
 	struct xps_dev_maps *dev_maps, *new_dev_maps;
 	int nonempty = 0;
-	int numa_node = -2;
+	int numa_node_id = -2;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -944,10 +944,10 @@ static ssize_t store_xps_map(struct netdev_queue *queue,
 		need_set = cpumask_test_cpu(cpu, mask) && cpu_online(cpu);
 #ifdef CONFIG_NUMA
 		if (need_set) {
-			if (numa_node == -2)
-				numa_node = cpu_to_node(cpu);
-			else if (numa_node != cpu_to_node(cpu))
-				numa_node = -1;
+			if (numa_node_id == -2)
+				numa_node_id = cpu_to_node(cpu);
+			else if (numa_node_id != cpu_to_node(cpu))
+				numa_node_id = -1;
 		}
 #endif
 		if (need_set && pos >= map_len) {
@@ -997,7 +997,7 @@ static ssize_t store_xps_map(struct netdev_queue *queue,
 	if (dev_maps)
 		kfree_rcu(dev_maps, rcu);
 
-	netdev_queue_numa_node_write(queue, (numa_node >= 0) ? numa_node :
+	netdev_queue_numa_node_write(queue, (numa_node_id >= 0) ? numa_node_id :
 					    NUMA_NO_NODE);
 
 	mutex_unlock(&xps_map_mutex);
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 05/10] forcedeth: Add messages to indicate using MSI or MSI-X
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, Mike Ditto, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

From: Mike Ditto <mditto@google.com>

This adds a few kernel messages to indicate whether PCIe interrupts
are signaled with MSI or MSI-X.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 374625b..fe17e42 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -3810,6 +3810,7 @@ static int nv_request_irq(struct net_device *dev, int intr_test)
 				writel(0, base + NvRegMSIXMap0);
 				writel(0, base + NvRegMSIXMap1);
 			}
+			netdev_info(dev, "MSI-X enabled\n");
 		}
 	}
 	if (ret != 0 && np->msi_flags & NV_MSI_CAPABLE) {
@@ -3831,6 +3832,7 @@ static int nv_request_irq(struct net_device *dev, int intr_test)
 			writel(0, base + NvRegMSIMap1);
 			/* enable msi vector 0 */
 			writel(NVREG_MSI_VECTOR_0_ENABLED, base + NvRegMSIIrqMask);
+			netdev_info(dev, "MSI enabled\n");
 		}
 	}
 	if (ret != 0) {
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 10/10] forcedeth: whitespace/indentation fixes
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 5d436b5..973cf79 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -65,7 +65,7 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/prefetch.h>
-#include  <linux/io.h>
+#include <linux/io.h>
 
 #include <asm/irq.h>
 #include <asm/system.h>
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 09/10] forcedeth: stats updated with a deferrable timer
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

Mark stats timer as deferrable: punctuality in waking the stats timer
callback doesn't matter much, as it is responsible only to avoid
integer wraparound.

We need at least 1 other timer to fire within 17s (fully loaded 1Gbps)
to avoid wrap-arounds. Desired period is still 10s.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index d9b5a4d..5d436b5 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -5534,7 +5534,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	init_timer(&np->nic_poll);
 	np->nic_poll.data = (unsigned long) dev;
 	np->nic_poll.function = nv_do_nic_poll;	/* timer handler */
-	init_timer(&np->stats_poll);
+	init_timer_deferrable(&np->stats_poll);
 	np->stats_poll.data = (unsigned long) dev;
 	np->stats_poll.function = nv_do_stats_poll;	/* timer handler */
 
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 08/10] forcedeth: account for dropped RX frames
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

This adds code to update the stats counter for dropped RX frames.



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 drivers/net/ethernet/nvidia/forcedeth.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 6aeb0d6..d9b5a4d 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -824,6 +824,7 @@ struct fe_priv {
 	struct nv_driver_stat stat_rx_packets;
 	struct nv_driver_stat stat_rx_bytes; /* not always available in HW */
 	struct nv_driver_stat stat_rx_missed_errors;
+	struct nv_driver_stat stat_rx_dropped;
 
 	/* media detection workaround.
 	 * Locking: Within irq hander or disable_irq+spin_lock(&np->lock);
@@ -1739,6 +1740,7 @@ static void nv_update_stats(struct net_device *dev)
 	/* update software stats */
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_packets);
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_bytes);
+	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_dropped);
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_rx_missed_errors);
 
 	NV_DRIVER_STAT_UPDATE_TOTAL(&np->stat_tx_packets);
@@ -1786,6 +1788,8 @@ nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
 			&np->stat_tx_bytes);
 		storage->rx_errors  = np->estats.rx_errors_total;
 		storage->tx_errors  = np->estats.tx_errors_total;
+		storage->rx_dropped = NV_DRIVER_STAT_GET_TOTAL(
+			&np->stat_rx_dropped);
 		storage->tx_dropped = NV_DRIVER_STAT_GET_TOTAL(
 			&np->stat_tx_dropped);
 
@@ -1841,8 +1845,10 @@ static int nv_alloc_rx(struct net_device *dev)
 				np->put_rx.orig = np->first_rx.orig;
 			if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
 				np->put_rx_ctx = np->first_rx_ctx;
-		} else
+		} else {
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_dropped);
 			return 1;
+		}
 	}
 	return 0;
 }
@@ -1873,8 +1879,10 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 				np->put_rx.ex = np->first_rx.ex;
 			if (unlikely(np->put_rx_ctx++ == np->last_rx_ctx))
 				np->put_rx_ctx = np->first_rx_ctx;
-		} else
+		} else {
+			NV_DRIVER_STAT_ATOMIC_INC(&np->stat_rx_dropped);
 			return 1;
+		}
 	}
 	return 0;
 }
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 04/10] net: provide counter for tx_timeout errors in sysfs
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

This adds the /sys/class/net/DEV/queues/Q/tx_timeout attribute
containing the total number of timeout events on the given queue. It
is always available, independently of CONFIG_RPS/XPS.

Credits to Stephen Hemminger for a preliminary version of this patch.

Tested:
  without CONFIG_SYSFS (compilation only)
  with sysfs and without CONFIG_RPS & CONFIG_XPS
  with sysfs and without CONFIG_RPS
  with sysfs and without CONFIG_XPS
  with defaults



Signed-off-by: David Decotigny <david.decotigny@google.com>
---
 include/linux/netdevice.h |   12 ++++++++++--
 net/core/net-sysfs.c      |   37 +++++++++++++++++++++++++++++++------
 net/sched/sch_generic.c   |    1 +
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cbeb586..9e6cf09 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -530,7 +530,7 @@ struct netdev_queue {
 	struct Qdisc		*qdisc;
 	unsigned long		state;
 	struct Qdisc		*qdisc_sleeping;
-#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
+#ifdef CONFIG_SYSFS
 	struct kobject		kobj;
 #endif
 #if defined(CONFIG_XPS) && defined(CONFIG_NUMA)
@@ -545,6 +545,12 @@ struct netdev_queue {
 	 * please use this field instead of dev->trans_start
 	 */
 	unsigned long		trans_start;
+
+	/*
+	 * Number of TX timeouts for this queue
+	 * (/sys/class/net/DEV/Q/trans_timeout)
+	 */
+	unsigned long		trans_timeout;
 } ____cacheline_aligned_in_smp;
 
 static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
@@ -1184,9 +1190,11 @@ struct net_device {
 
 	unsigned char		broadcast[MAX_ADDR_LEN];	/* hw bcast add	*/
 
-#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
+#ifdef CONFIG_SYSFS
 	struct kset		*queues_kset;
+#endif
 
+#ifdef CONFIG_RPS
 	struct netdev_rx_queue	*_rx;
 
 	/* Number of RX queues allocated at register_netdev() time */
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index a64382f..602b141 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -780,7 +780,7 @@ net_rx_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
 #endif
 }
 
-#ifdef CONFIG_XPS
+#ifdef CONFIG_SYSFS
 /*
  * netdev_queue sysfs structures and functions.
  */
@@ -826,6 +826,23 @@ static const struct sysfs_ops netdev_queue_sysfs_ops = {
 	.store = netdev_queue_attr_store,
 };
 
+static ssize_t show_trans_timeout(struct netdev_queue *queue,
+				  struct netdev_queue_attribute *attribute,
+				  char *buf)
+{
+	unsigned long trans_timeout;
+
+	spin_lock_irq(&queue->_xmit_lock);
+	trans_timeout = queue->trans_timeout;
+	spin_unlock_irq(&queue->_xmit_lock);
+
+	return sprintf(buf, "%lu", trans_timeout);
+}
+
+static struct netdev_queue_attribute queue_trans_timeout =
+	__ATTR(tx_timeout, S_IRUGO, show_trans_timeout, NULL);
+
+#ifdef CONFIG_XPS
 static inline unsigned int get_netdev_queue_index(struct netdev_queue *queue)
 {
 	struct net_device *dev = queue->dev;
@@ -1020,12 +1037,17 @@ error:
 
 static struct netdev_queue_attribute xps_cpus_attribute =
     __ATTR(xps_cpus, S_IRUGO | S_IWUSR, show_xps_map, store_xps_map);
+#endif /* CONFIG_XPS */
 
 static struct attribute *netdev_queue_default_attrs[] = {
+	&queue_trans_timeout.attr,
+#ifdef CONFIG_XPS
 	&xps_cpus_attribute.attr,
+#endif
 	NULL
 };
 
+#ifdef CONFIG_XPS
 static void netdev_queue_release(struct kobject *kobj)
 {
 	struct netdev_queue *queue = to_netdev_queue(kobj);
@@ -1076,10 +1098,13 @@ static void netdev_queue_release(struct kobject *kobj)
 	memset(kobj, 0, sizeof(*kobj));
 	dev_put(queue->dev);
 }
+#endif /* CONFIG_XPS */
 
 static struct kobj_type netdev_queue_ktype = {
 	.sysfs_ops = &netdev_queue_sysfs_ops,
+#ifdef CONFIG_XPS
 	.release = netdev_queue_release,
+#endif
 	.default_attrs = netdev_queue_default_attrs,
 };
 
@@ -1102,12 +1127,12 @@ static int netdev_queue_add_kobject(struct net_device *net, int index)
 
 	return error;
 }
-#endif /* CONFIG_XPS */
+#endif /* CONFIG_SYSFS */
 
 int
 netdev_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
 {
-#ifdef CONFIG_XPS
+#ifdef CONFIG_SYSFS
 	int i;
 	int error = 0;
 
@@ -1125,14 +1150,14 @@ netdev_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
 	return error;
 #else
 	return 0;
-#endif
+#endif /* CONFIG_SYSFS */
 }
 
 static int register_queue_kobjects(struct net_device *net)
 {
 	int error = 0, txq = 0, rxq = 0, real_rx = 0, real_tx = 0;
 
-#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
+#ifdef CONFIG_SYSFS
 	net->queues_kset = kset_create_and_add("queues",
 	    NULL, &net->dev.kobj);
 	if (!net->queues_kset)
@@ -1173,7 +1198,7 @@ static void remove_queue_kobjects(struct net_device *net)
 
 	net_rx_queue_update_kobjects(net, real_rx, 0);
 	netdev_queue_update_kobjects(net, real_tx, 0);
-#if defined(CONFIG_RPS) || defined(CONFIG_XPS)
+#ifdef CONFIG_SYSFS
 	kset_unregister(net->queues_kset);
 #endif
 }
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 69fca27..79ac145 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -246,6 +246,7 @@ static void dev_watchdog(unsigned long arg)
 				    time_after(jiffies, (trans_start +
 							 dev->watchdog_timeo))) {
 					some_queue_timedout = 1;
+					txq->trans_timeout++;
 					break;
 				}
 			}
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 01/10] forcedeth: fix stats on hardware without extended stats support
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, david decotigny
In-Reply-To: <cover.1321420513.git.david.decotigny@google.com>

From: david decotigny <david.decotigny@google.com>

This change makes sure that tx_packets/rx_bytes ifconfig counters are
updated even on NICs that don't provide hardware support for these
stats: they are now updated in software. For the sake of consistency,
we also now have tx_bytes updated in software (hardware counters
include ethernet CRC, and software doesn't account for it).

This reverts parts of:
 - "forcedeth: statistics optimization" (21828163b2)
 - "forcedeth: Improve stats counters" (0bdfea8ba8)
 - "forcedeth: remove unneeded stats updates" (4687f3f364)

Tested:
  pktgen + loopback (http://patchwork.ozlabs.org/patch/124698/)
  reports identical tx_packets/rx_packets and tx_bytes/rx_bytes.

Signed-off-by: David Decotigny <david.decotigny@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 898bdf2cb43eb0a962c397eb4dd1aec2c7211be2)


---
 drivers/net/ethernet/nvidia/forcedeth.c |   36 +++++++++++++++++++++++-------
 1 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index e8a5ae3..374625b 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -609,7 +609,7 @@ struct nv_ethtool_str {
 };
 
 static const struct nv_ethtool_str nv_estats_str[] = {
-	{ "tx_bytes" },
+	{ "tx_bytes" }, /* includes Ethernet FCS CRC */
 	{ "tx_zero_rexmt" },
 	{ "tx_one_rexmt" },
 	{ "tx_many_rexmt" },
@@ -637,7 +637,7 @@ static const struct nv_ethtool_str nv_estats_str[] = {
 	/* version 2 stats */
 	{ "tx_deferral" },
 	{ "tx_packets" },
-	{ "rx_bytes" },
+	{ "rx_bytes" }, /* includes Ethernet FCS CRC */
 	{ "tx_pause" },
 	{ "rx_pause" },
 	{ "rx_drop_frame" },
@@ -649,7 +649,7 @@ static const struct nv_ethtool_str nv_estats_str[] = {
 };
 
 struct nv_ethtool_stats {
-	u64 tx_bytes;
+	u64 tx_bytes; /* should be ifconfig->tx_bytes + 4*tx_packets */
 	u64 tx_zero_rexmt;
 	u64 tx_one_rexmt;
 	u64 tx_many_rexmt;
@@ -670,14 +670,14 @@ struct nv_ethtool_stats {
 	u64 rx_unicast;
 	u64 rx_multicast;
 	u64 rx_broadcast;
-	u64 rx_packets;
+	u64 rx_packets; /* should be ifconfig->rx_packets */
 	u64 rx_errors_total;
 	u64 tx_errors_total;
 
 	/* version 2 stats */
 	u64 tx_deferral;
-	u64 tx_packets;
-	u64 rx_bytes;
+	u64 tx_packets; /* should be ifconfig->tx_packets */
+	u64 rx_bytes;   /* should be ifconfig->rx_bytes + 4*rx_packets */
 	u64 tx_pause;
 	u64 rx_pause;
 	u64 rx_drop_frame;
@@ -1706,10 +1706,17 @@ static struct net_device_stats *nv_get_stats(struct net_device *dev)
 	if (np->driver_data & (DEV_HAS_STATISTICS_V1|DEV_HAS_STATISTICS_V2|DEV_HAS_STATISTICS_V3)) {
 		nv_get_hw_stats(dev);
 
+		/*
+		 * Note: because HW stats are not always available and
+		 * for consistency reasons, the following ifconfig
+		 * stats are managed by software: rx_bytes, tx_bytes,
+		 * rx_packets and tx_packets. The related hardware
+		 * stats reported by ethtool should be equivalent to
+		 * these ifconfig stats, with 4 additional bytes per
+		 * packet (Ethernet FCS CRC).
+		 */
+
 		/* copy to net_device stats */
-		dev->stats.tx_packets = np->estats.tx_packets;
-		dev->stats.rx_bytes = np->estats.rx_bytes;
-		dev->stats.tx_bytes = np->estats.tx_bytes;
 		dev->stats.tx_fifo_errors = np->estats.tx_fifo_errors;
 		dev->stats.tx_carrier_errors = np->estats.tx_carrier_errors;
 		dev->stats.rx_crc_errors = np->estats.rx_crc_errors;
@@ -2380,6 +2387,9 @@ static int nv_tx_done(struct net_device *dev, int limit)
 				if (flags & NV_TX_ERROR) {
 					if ((flags & NV_TX_RETRYERROR) && !(flags & NV_TX_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
+				} else {
+					dev->stats.tx_packets++;
+					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
 				}
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
@@ -2390,6 +2400,9 @@ static int nv_tx_done(struct net_device *dev, int limit)
 				if (flags & NV_TX2_ERROR) {
 					if ((flags & NV_TX2_RETRYERROR) && !(flags & NV_TX2_RETRYCOUNT_MASK))
 						nv_legacybackoff_reseed(dev);
+				} else {
+					dev->stats.tx_packets++;
+					dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
 				}
 				dev_kfree_skb_any(np->get_tx_ctx->skb);
 				np->get_tx_ctx->skb = NULL;
@@ -2429,6 +2442,9 @@ static int nv_tx_done_optimized(struct net_device *dev, int limit)
 					else
 						nv_legacybackoff_reseed(dev);
 				}
+			} else {
+				dev->stats.tx_packets++;
+				dev->stats.tx_bytes += np->get_tx_ctx->skb->len;
 			}
 
 			dev_kfree_skb_any(np->get_tx_ctx->skb);
@@ -2678,6 +2694,7 @@ static int nv_rx_process(struct net_device *dev, int limit)
 		skb->protocol = eth_type_trans(skb, dev);
 		napi_gro_receive(&np->napi, skb);
 		dev->stats.rx_packets++;
+		dev->stats.rx_bytes += len;
 next_pkt:
 		if (unlikely(np->get_rx.orig++ == np->last_rx.orig))
 			np->get_rx.orig = np->first_rx.orig;
@@ -2761,6 +2778,7 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 			}
 			napi_gro_receive(&np->napi, skb);
 			dev->stats.rx_packets++;
+			dev->stats.rx_bytes += len;
 		} else {
 			dev_kfree_skb(skb);
 		}
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH net-next v5 00/10] net-sysfs+forcedeth: stats & debug enhancements
From: David Decotigny @ 2011-11-16  5:15 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny

These changes provide a new tx_timeout sysfs attribute and implement
the ndo_get_stats64 API. They also add a few more stats and debugging
features for forcedeth. They ensure that stats updates are correct in
SMP systems, 32 or 64-bits.

Note: patch 1 is the cherry-pick of 898bdf2cb43e ("forcedeth: fix
stats on hardware without extended stats support")

Changes since v4:
  - tx_timeout counter now a generic sysfs attribute. Credits to
    Stephen Hemminger for initial implementation
  - revert get_stats64 to using atomic variables: see
    http://patchwork.ozlabs.org/patch/125861/ for motivations
  - dropped patch "expose module parameters in /sys/module" for now: I
    will work on this later, following Stephen's recommendations
    (http://patchwork.ozlabs.org/patch/125862/).

Changes since v3:
  - updated get_stats64 + rx_dropped patches to use u64_stats_sync.h
  - dropped indentation "whitespace/indentation fixes" (included in
    get_stats64 api patch)

Changes since v2:
  - patch 1/9 is the cherry-pick of 898bdf2cb43e ("forcedeth: fix
    stats on hardware without extended stats support")
  - removed patch 5/10 "stats for rx_packets based on hardware
    registers" because packets&bytes stats are updated in software
    only (898bdf2cb43e)

Changes since v1:
  - patch 1/10 is the same as
    http://patchwork.ozlabs.org/patch/125017/ (targetting net)
  - other patches updated to take patch 1/10 into account
  - various commit message updates


Tested:
  ~150Mbps incoming TCP, ethtool -S in a loop, x86_64 16-way:
     tx_bytes: 5441989419
     rx_packets: 5439224
     tx_timeout: 0
     tx_packets: 5456705
     rx_bytes: 5566763850

Tested:
  pktgen + loopback report same RX/TX packets and bytes stats

Tested:
  tests above with Kconfig DEBUG_PAGEALLOC DEBUG_MUTEXES
  DEBUG_SPINLOCK LOCKUP_DETECTOR DEBUG_RT_MUTEXES DEBUG_LOCK_ALLOC
  PROVE_LOCKING DEBUG_ATOMIC_SLEEP DEBUG_STACK_USAGE DEBUG_KOBJECT
  DEBUG_VM DEBUG_LIST DEBUG_SG DEBUG_NOTIFIERS TEST_KSTRTOX
  STRICT_DEVMEM DEBUG_STACKOVERFLOW


############################################
# Patch Set Summary:

David Decotigny (7):
  net-sysfs: fixed minor sparse warning
  kbuild: document RPS/XPS network Kconfig options
  net: provide counter for tx_timeout errors in sysfs
  forcedeth: implement ndo_get_stats64() API
  forcedeth: account for dropped RX frames
  forcedeth: stats updated with a deferrable timer
  forcedeth: whitespace/indentation fixes

Mike Ditto (1):
  forcedeth: Add messages to indicate using MSI or MSI-X

Sameer Nanda (1):
  forcedeth: allow to silence "TX timeout" debug messages

david decotigny (1):
  forcedeth: fix stats on hardware without extended stats support

 drivers/net/ethernet/nvidia/forcedeth.c |  320 ++++++++++++++++++++++---------
 include/linux/netdevice.h               |   12 +-
 net/Kconfig                             |   16 ++-
 net/core/net-sysfs.c                    |   49 ++++--
 net/sched/sch_generic.c                 |    1 +
 5 files changed, 293 insertions(+), 105 deletions(-)

-- 
1.7.3.1

^ permalink raw reply

* [PATCH net-next v4 00/10] net-sysfs+forcedeth: stats & debug enhancements
From: David Decotigny @ 2011-11-16  5:13 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla, David Decotigny

These changes provide a new tx_timeout sysfs attribute and implement
the ndo_get_stats64 API. They also add a few more stats and debugging
features for forcedeth. They ensure that stats updates are correct in
SMP systems, 32 or 64-bits.

Note: patch 1 is the cherry-pick of 898bdf2cb43e ("forcedeth: fix
stats on hardware without extended stats support")

Changes since v4:
  - tx_timeout counter now a generic sysfs attribute. Credits to
    Stephen Hemminger for initial implementation
  - revert get_stats64 to using atomic variables: see
    http://patchwork.ozlabs.org/patch/125861/ for motivations
  - dropped patch "expose module parameters in /sys/module" for now: I
    will work on this later, following Stephen's recommendations
    (http://patchwork.ozlabs.org/patch/125862/).

Changes since v3:
  - updated get_stats64 + rx_dropped patches to use u64_stats_sync.h
  - dropped indentation "whitespace/indentation fixes" (included in
    get_stats64 api patch)

Changes since v2:
  - patch 1/9 is the cherry-pick of 898bdf2cb43e ("forcedeth: fix
    stats on hardware without extended stats support")
  - removed patch 5/10 "stats for rx_packets based on hardware
    registers" because packets&bytes stats are updated in software
    only (898bdf2cb43e)

Changes since v1:
  - patch 1/10 is the same as
    http://patchwork.ozlabs.org/patch/125017/ (targetting net)
  - other patches updated to take patch 1/10 into account
  - various commit message updates


Tested:
  ~150Mbps incoming TCP, ethtool -S in a loop, x86_64 16-way:
     tx_bytes: 5441989419
     rx_packets: 5439224
     tx_timeout: 0
     tx_packets: 5456705
     rx_bytes: 5566763850

Tested:
  pktgen + loopback report same RX/TX packets and bytes stats

Tested:
  tests above with Kconfig DEBUG_PAGEALLOC DEBUG_MUTEXES
  DEBUG_SPINLOCK LOCKUP_DETECTOR DEBUG_RT_MUTEXES DEBUG_LOCK_ALLOC
  PROVE_LOCKING DEBUG_ATOMIC_SLEEP DEBUG_STACK_USAGE DEBUG_KOBJECT
  DEBUG_VM DEBUG_LIST DEBUG_SG DEBUG_NOTIFIERS TEST_KSTRTOX
  STRICT_DEVMEM DEBUG_STACKOVERFLOW


############################################
# Patch Set Summary:

David Decotigny (7):
  net-sysfs: fixed minor sparse warning
  kbuild: document RPS/XPS network Kconfig options
  net: provide counter for tx_timeout errors in sysfs
  forcedeth: implement ndo_get_stats64() API
  forcedeth: account for dropped RX frames
  forcedeth: stats updated with a deferrable timer
  forcedeth: whitespace/indentation fixes

Mike Ditto (1):
  forcedeth: Add messages to indicate using MSI or MSI-X

Sameer Nanda (1):
  forcedeth: allow to silence "TX timeout" debug messages

david decotigny (1):
  forcedeth: fix stats on hardware without extended stats support

 drivers/net/ethernet/nvidia/forcedeth.c |  320 ++++++++++++++++++++++---------
 include/linux/netdevice.h               |   12 +-
 net/Kconfig                             |   16 ++-
 net/core/net-sysfs.c                    |   49 ++++--
 net/sched/sch_generic.c                 |    1 +
 5 files changed, 293 insertions(+), 105 deletions(-)

-- 
1.7.3.1

^ permalink raw reply

* [PATCH iproute2] ss: report ecnseen
From: Eric Dumazet @ 2011-11-16  4:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <1318389501.3686.8.camel@edumazet-laptop>

Support ECNSEEN reporting in ss command.

ESTAB      0      0           10.170.73.123:4900
10.170.73.125:51001    uid:501 ino:385994 sk:f31e5f00
         mem:(r0,w0,f0,t0) ts sack ecn ecnseen bic wscale:8,8 rto:210
rtt:18.75/15 ato:40 cwnd:10 send 69.9Mbps rcv_space:32768

"ecn" means TCP session negociated ECN capability (TCP layer) at setup
time

"ecnseen" at least one frame with ECT(0) or ECT(1) or ECN (IP layer) was
received from peer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/netinet/tcp.h |    1 +
 misc/ss.c             |    2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/netinet/tcp.h b/include/netinet/tcp.h
index 282b29c..95a4fe6 100644
--- a/include/netinet/tcp.h
+++ b/include/netinet/tcp.h
@@ -172,6 +172,7 @@ enum
 # define TCPI_OPT_SACK		2
 # define TCPI_OPT_WSCALE	4
 # define TCPI_OPT_ECN		8
+# define TCPI_OPT_ECNSEEN	16
 
 /* Values for tcpi_state.  */
 enum tcp_ca_state
diff --git a/misc/ss.c b/misc/ss.c
index b00841b..778bf0a 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1357,6 +1357,8 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r)
 				printf(" sack");
 			if (info->tcpi_options & TCPI_OPT_ECN)
 				printf(" ecn");
+			if (info->tcpi_options & TCPI_OPT_ECNSEEN)
+				printf(" ecnseen");
 		}
 
 		if (tb[INET_DIAG_CONG])

^ permalink raw reply related

* Re: [RFC] iproute:  Support cross-compiling.
From: Eric Dumazet @ 2011-11-16  4:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: greearb, netdev
In-Reply-To: <20111115171722.02fa1c77@nehalam.linuxnetplumber.net>

Le mardi 15 novembre 2011 à 17:17 -0800, Stephen Hemminger a écrit :
> On Tue, 15 Nov 2011 17:04:42 -0800
> greearb@candelatech.com wrote:
> 
> > From: Ben Greear <greearb@candelatech.com>
> > 
> > This lets users use their own compiler instead of
> > hard-coding to use gcc.
> > 
> > Also adds tests to disable some things that were not supported
> > in my ARM cross-compile toolchain.
> > 
> > Signed-off-by: Ben Greear <greearb@candelatech.com>
> 
> Can't you do this by setting up the compile environment better?
> I would rather have the tools work with all features and handle
> errors from kernel rather than neutering it.

I have roughly same errors when compiling on RHEL4

glibc-2.3.4-2.36 doesnt contain unshare() call, but my dev kernel
certainly have it ;)

^ permalink raw reply

* [PATCH] net: add calxeda xgmac ethernet driver
From: Rob Herring @ 2011-11-16  2:46 UTC (permalink / raw)
  To: netdev, devicetree-discuss, joe; +Cc: saeed.bishara, Rob Herring

From: Rob Herring <rob.herring@calxeda.com>

Add support for the XGMAC 10Gb ethernet device in the Calxeda Highbank
SOC.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
---
v3:
- add dma_unmap_page calls for skb frags
- use module_platform_driver
- rebase to 3.2-rc2

v2:
- use __le32 for descriptor fields and cpu_to_le32/le32_to_cpu to access
- use u32 instead of dma_addr_t for descriptor phys addresses
- improve allocation error handling for descriptor ring allocations
- convert all prints to netdev_XXX
- rebase to current Linus master
- move into drivers/net/ethernet

 .../devicetree/bindings/net/calxeda-xgmac.txt      |   16 +
 drivers/net/ethernet/Kconfig                       |    1 +
 drivers/net/ethernet/Makefile                      |    1 +
 drivers/net/ethernet/calxeda/Kconfig               |    7 +
 drivers/net/ethernet/calxeda/Makefile              |    2 +
 drivers/net/ethernet/calxeda/xgmac.c               | 1927 ++++++++++++++++++++
 6 files changed, 1954 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/calxeda-xgmac.txt
 create mode 100644 drivers/net/ethernet/calxeda/Kconfig
 create mode 100644 drivers/net/ethernet/calxeda/Makefile
 create mode 100644 drivers/net/ethernet/calxeda/xgmac.c

diff --git a/Documentation/devicetree/bindings/net/calxeda-xgmac.txt b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt
new file mode 100644
index 0000000..c03a7bc
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt
@@ -0,0 +1,16 @@
+* Calxeda Highbank 10Gb XGMAC Ethernet
+
+Required properties:
+- compatible : Should be "calxeda,hb-xgmac"
+- reg : Address and length of the register set for the device
+- interrupts : Should contain 3 xgmac interrupts. The 1st is main interrupt.
+  The 2nd is pwr mgt interrupt. The 3rd is low power state interrupt.
+
+Example:
+
+ethernet@fff50000 {
+        compatible = "calxeda,hb-xgmac";
+        reg = <0xfff50000 0x1000>;
+        interrupts = <0 77 4  0 78 4  0 79 4>;
+};
+
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 597f4d4..3474a61 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -28,6 +28,7 @@ source "drivers/net/ethernet/cadence/Kconfig"
 source "drivers/net/ethernet/adi/Kconfig"
 source "drivers/net/ethernet/broadcom/Kconfig"
 source "drivers/net/ethernet/brocade/Kconfig"
+source "drivers/net/ethernet/calxeda/Kconfig"
 source "drivers/net/ethernet/chelsio/Kconfig"
 source "drivers/net/ethernet/cirrus/Kconfig"
 source "drivers/net/ethernet/cisco/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index be5dde0..cd6d69a 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_NET_ATMEL) += cadence/
 obj-$(CONFIG_NET_BFIN) += adi/
 obj-$(CONFIG_NET_VENDOR_BROADCOM) += broadcom/
 obj-$(CONFIG_NET_VENDOR_BROCADE) += brocade/
+obj-$(CONFIG_NET_CALXEDA_XGMAC) += calxeda/
 obj-$(CONFIG_NET_VENDOR_CHELSIO) += chelsio/
 obj-$(CONFIG_NET_VENDOR_CIRRUS) += cirrus/
 obj-$(CONFIG_NET_VENDOR_CISCO) += cisco/
diff --git a/drivers/net/ethernet/calxeda/Kconfig b/drivers/net/ethernet/calxeda/Kconfig
new file mode 100644
index 0000000..a52e725
--- /dev/null
+++ b/drivers/net/ethernet/calxeda/Kconfig
@@ -0,0 +1,7 @@
+config NET_CALXEDA_XGMAC
+	tristate "Calxeda 1G/10G XGMAC Ethernet driver"
+
+	select CRC32
+	help
+	  This is the driver for the XGMAC Ethernet IP block found on Calxeda
+	  Highbank platforms.
diff --git a/drivers/net/ethernet/calxeda/Makefile b/drivers/net/ethernet/calxeda/Makefile
new file mode 100644
index 0000000..5057cd2
--- /dev/null
+++ b/drivers/net/ethernet/calxeda/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_NET_CALXEDA_XGMAC) += xgmac.o
+
diff --git a/drivers/net/ethernet/calxeda/xgmac.c b/drivers/net/ethernet/calxeda/xgmac.c
new file mode 100644
index 0000000..dc23404
--- /dev/null
+++ b/drivers/net/ethernet/calxeda/xgmac.c
@@ -0,0 +1,1927 @@
+/*
+ * Copyright 2010-2011 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/circ_buf.h>
+#include <linux/interrupt.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/skbuff.h>
+#include <linux/ethtool.h>
+#include <linux/if.h>
+#include <linux/crc32.h>
+#include <linux/dma-mapping.h>
+#include <linux/slab.h>
+
+/* XGMAC Register definitions */
+#define XGMAC_CONTROL		0x00000000	/* MAC Configuration */
+#define XGMAC_FRAME_FILTER	0x00000004	/* MAC Frame Filter */
+#define XGMAC_FLOW_CTRL		0x00000018	/* MAC Flow Control */
+#define XGMAC_VLAN_TAG		0x0000001C	/* VLAN Tags */
+#define XGMAC_VERSION		0x00000020	/* Version */
+#define XGMAC_VLAN_INCL		0x00000024	/* VLAN tag for tx frames */
+#define XGMAC_LPI_CTRL		0x00000028	/* LPI Control and Status */
+#define XGMAC_LPI_TIMER		0x0000002C	/* LPI Timers Control */
+#define XGMAC_TX_PACE		0x00000030	/* Transmit Pace and Stretch */
+#define XGMAC_VLAN_HASH		0x00000034	/* VLAN Hash Table */
+#define XGMAC_DEBUG		0x00000038	/* Debug */
+#define XGMAC_INT_STAT		0x0000003C	/* Interrupt and Control */
+#define XGMAC_ADDR_HIGH(reg)	(0x00000040+((reg) * 8))
+#define XGMAC_ADDR_LOW(reg)	(0x00000044+((reg) * 8))
+#define XGMAC_HASH(n)		(0x00000300 + (n) * 4) /* HASH table regs */
+#define XGMAC_NUM_HASH		16
+#define XGMAC_OMR		0x00000400
+#define XGMAC_REMOTE_WAKE	0x00000700	/* Remote Wake-Up Frm Filter */
+#define XGMAC_PMT		0x00000704	/* PMT Control and Status */
+#define XGMAC_MMC_CTRL		0x00000800	/* XGMAC MMC Control */
+#define XGMAC_MMC_INTR_RX	0x00000804	/* Recieve Interrupt */
+#define XGMAC_MMC_INTR_TX	0x00000808	/* Transmit Interrupt */
+#define XGMAC_MMC_INTR_MASK_RX	0x0000080c	/* Recieve Interrupt Mask */
+#define XGMAC_MMC_INTR_MASK_TX	0x00000810	/* Transmit Interrupt Mask */
+
+/* Hardware TX Statistics Counters */
+#define XGMAC_MMC_TXOCTET_GB_LO	0x00000814
+#define XGMAC_MMC_TXOCTET_GB_HI	0x00000818
+#define XGMAC_MMC_TXFRAME_GB_LO	0x0000081C
+#define XGMAC_MMC_TXFRAME_GB_HI	0x00000820
+#define XGMAC_MMC_TXBCFRAME_G	0x00000824
+#define XGMAC_MMC_TXMCFRAME_G	0x0000082C
+#define XGMAC_MMC_TXUCFRAME_GB	0x00000864
+#define XGMAC_MMC_TXMCFRAME_GB	0x0000086C
+#define XGMAC_MMC_TXBCFRAME_GB	0x00000874
+#define XGMAC_MMC_TXUNDERFLOW	0x0000087C
+#define XGMAC_MMC_TXOCTET_G_LO	0x00000884
+#define XGMAC_MMC_TXOCTET_G_HI	0x00000888
+#define XGMAC_MMC_TXFRAME_G_LO	0x0000088C
+#define XGMAC_MMC_TXFRAME_G_HI	0x00000890
+#define XGMAC_MMC_TXPAUSEFRAME	0x00000894
+#define XGMAC_MMC_TXVLANFRAME	0x0000089C
+
+/* Hardware RX Statistics Counters */
+#define XGMAC_MMC_RXFRAME_GB_LO	0x00000900
+#define XGMAC_MMC_RXFRAME_GB_HI	0x00000904
+#define XGMAC_MMC_RXOCTET_GB_LO	0x00000908
+#define XGMAC_MMC_RXOCTET_GB_HI	0x0000090C
+#define XGMAC_MMC_RXOCTET_G_LO	0x00000910
+#define XGMAC_MMC_RXOCTET_G_HI	0x00000914
+#define XGMAC_MMC_RXBCFRAME_G	0x00000918
+#define XGMAC_MMC_RXMCFRAME_G	0x00000920
+#define XGMAC_MMC_RXCRCERR	0x00000928
+#define XGMAC_MMC_RXRUNT	0x00000930
+#define XGMAC_MMC_RXJABBER	0x00000934
+#define XGMAC_MMC_RXUCFRAME_G	0x00000970
+#define XGMAC_MMC_RXLENGTHERR	0x00000978
+#define XGMAC_MMC_RXOVERFLOW	0x00000990
+#define XGMAC_MMC_RXPAUSEFRAME	0x00000988
+#define XGMAC_MMC_RXVLANFRAME	0x00000998
+
+/* DMA Control and Status Registers */
+#define XGMAC_DMA_BUS_MODE	0x00000f00	/* Bus Mode */
+#define XGMAC_DMA_TX_POLL	0x00000f04	/* Transmit Poll Demand */
+#define XGMAC_DMA_RX_POLL	0x00000f08	/* Received Poll Demand */
+#define XGMAC_DMA_RX_BASE_ADDR	0x00000f0c	/* Receive List Base */
+#define XGMAC_DMA_TX_BASE_ADDR	0x00000f10	/* Transmit List Base */
+#define XGMAC_DMA_STATUS	0x00000f14	/* Status Register */
+#define XGMAC_DMA_CONTROL	0x00000f18	/* Ctrl (Operational Mode) */
+#define XGMAC_DMA_INTR_ENA	0x00000f1c	/* Interrupt Enable */
+#define XGMAC_DMA_MISS_FRAME_CTR 0x00000f20	/* Missed Frame Counter */
+#define XGMAC_DMA_RI_WDOG_TIMER	0x00000f24	/* RX Intr Watchdog Timer */
+#define XGMAC_DMA_AXI_BUS	0x00000f28	/* AXI Bus Mode */
+#define XGMAC_DMA_AXI_STATUS	0x00000f2C	/* AXI Status */
+#define XGMAC_DMA_HW_FEATURE	0x00000f58	/* Enabled Hardware Features */
+
+#define XGMAC_ADDR_AE		0x80000000
+#define XGMAC_MAX_FILTER_ADDR	31
+
+/* PMT Control and Status */
+#define XGMAC_PMT_POINTER_RESET	0x80000000
+#define XGMAC_PMT_GLBL_UNICAST	0x00000200
+#define XGMAC_PMT_WAKEUP_RX_FRM	0x00000040
+#define XGMAC_PMT_MAGIC_PKT	0x00000020
+#define XGMAC_PMT_WAKEUP_FRM_EN	0x00000004
+#define XGMAC_PMT_MAGIC_PKT_EN	0x00000002
+#define XGMAC_PMT_POWERDOWN	0x00000001
+
+#define XGMAC_CONTROL_SPD	0x40000000	/* Speed control */
+#define XGMAC_CONTROL_SPD_MASK	0x60000000
+#define XGMAC_CONTROL_SPD_1G	0x60000000
+#define XGMAC_CONTROL_SPD_2_5G	0x40000000
+#define XGMAC_CONTROL_SPD_10G	0x00000000
+#define XGMAC_CONTROL_SARC	0x10000000	/* Source Addr Insert/Replace */
+#define XGMAC_CONTROL_SARK_MASK	0x18000000
+#define XGMAC_CONTROL_CAR	0x04000000	/* CRC Addition/Replacement */
+#define XGMAC_CONTROL_CAR_MASK	0x06000000
+#define XGMAC_CONTROL_DP	0x01000000	/* Disable Padding */
+#define XGMAC_CONTROL_WD	0x00800000	/* Disable Watchdog on rx */
+#define XGMAC_CONTROL_JD	0x00400000	/* Jabber disable */
+#define XGMAC_CONTROL_JE	0x00100000	/* Jumbo frame */
+#define XGMAC_CONTROL_LM	0x00001000	/* Loop-back mode */
+#define XGMAC_CONTROL_IPC	0x00000400	/* Checksum Offload */
+#define XGMAC_CONTROL_ACS	0x00000080	/* Automatic Pad/FCS Strip */
+#define XGMAC_CONTROL_DDIC	0x00000010	/* Disable Deficit Idle Count */
+#define XGMAC_CONTROL_TE	0x00000008	/* Transmitter Enable */
+#define XGMAC_CONTROL_RE	0x00000004	/* Receiver Enable */
+
+/* XGMAC Frame Filter defines */
+#define XGMAC_FRAME_FILTER_PR	0x00000001	/* Promiscuous Mode */
+#define XGMAC_FRAME_FILTER_HUC	0x00000002	/* Hash Unicast */
+#define XGMAC_FRAME_FILTER_HMC	0x00000004	/* Hash Multicast */
+#define XGMAC_FRAME_FILTER_DAIF	0x00000008	/* DA Inverse Filtering */
+#define XGMAC_FRAME_FILTER_PM	0x00000010	/* Pass all multicast */
+#define XGMAC_FRAME_FILTER_DBF	0x00000020	/* Disable Broadcast frames */
+#define XGMAC_FRAME_FILTER_SAIF	0x00000100	/* Inverse Filtering */
+#define XGMAC_FRAME_FILTER_SAF	0x00000200	/* Source Address Filter */
+#define XGMAC_FRAME_FILTER_HPF	0x00000400	/* Hash or perfect Filter */
+#define XGMAC_FRAME_FILTER_VHF	0x00000800	/* VLAN Hash Filter */
+#define XGMAC_FRAME_FILTER_VPF	0x00001000	/* VLAN Perfect Filter */
+#define XGMAC_FRAME_FILTER_RA	0x80000000	/* Receive all mode */
+
+/* XGMAC FLOW CTRL defines */
+#define XGMAC_FLOW_CTRL_PT_MASK	0xffff0000	/* Pause Time Mask */
+#define XGMAC_FLOW_CTRL_PT_SHIFT	16
+#define XGMAC_FLOW_CTRL_DZQP	0x00000080	/* Disable Zero-Quanta Phase */
+#define XGMAC_FLOW_CTRL_PLT	0x00000020	/* Pause Low Threshhold */
+#define XGMAC_FLOW_CTRL_PLT_MASK 0x00000030	/* PLT MASK */
+#define XGMAC_FLOW_CTRL_UP	0x00000008	/* Unicast Pause Frame Detect */
+#define XGMAC_FLOW_CTRL_RFE	0x00000004	/* Rx Flow Control Enable */
+#define XGMAC_FLOW_CTRL_TFE	0x00000002	/* Tx Flow Control Enable */
+#define XGMAC_FLOW_CTRL_FCB_BPA	0x00000001	/* Flow Control Busy ... */
+
+/* XGMAC_INT_STAT reg */
+#define XGMAC_INT_STAT_PMT	0x0080		/* PMT Interrupt Status */
+#define XGMAC_INT_STAT_LPI	0x0040		/* LPI Interrupt Status */
+
+/* DMA Bus Mode register defines */
+#define DMA_BUS_MODE_SFT_RESET	0x00000001	/* Software Reset */
+#define DMA_BUS_MODE_DSL_MASK	0x0000007c	/* Descriptor Skip Length */
+#define DMA_BUS_MODE_DSL_SHIFT	2		/* (in DWORDS) */
+#define DMA_BUS_MODE_ATDS	0x00000080	/* Alternate Descriptor Size */
+
+/* Programmable burst length */
+#define DMA_BUS_MODE_PBL_MASK	0x00003f00	/* Programmable Burst Len */
+#define DMA_BUS_MODE_PBL_SHIFT	8
+#define DMA_BUS_MODE_FB		0x00010000	/* Fixed burst */
+#define DMA_BUS_MODE_RPBL_MASK	0x003e0000	/* Rx-Programmable Burst Len */
+#define DMA_BUS_MODE_RPBL_SHIFT	17
+#define DMA_BUS_MODE_USP	0x00800000
+#define DMA_BUS_MODE_8PBL	0x01000000
+#define DMA_BUS_MODE_AAL	0x02000000
+
+/* DMA Bus Mode register defines */
+#define DMA_BUS_PR_RATIO_MASK	0x0000c000	/* Rx/Tx priority ratio */
+#define DMA_BUS_PR_RATIO_SHIFT	14
+#define DMA_BUS_FB		0x00010000	/* Fixed Burst */
+
+/* DMA Control register defines */
+#define DMA_CONTROL_ST		0x00002000	/* Start/Stop Transmission */
+#define DMA_CONTROL_SR		0x00000002	/* Start/Stop Receive */
+#define DMA_CONTROL_DFF		0x01000000	/* Disable flush of rx frames */
+
+/* DMA Normal interrupt */
+#define DMA_INTR_ENA_NIE	0x00010000	/* Normal Summary */
+#define DMA_INTR_ENA_AIE	0x00008000	/* Abnormal Summary */
+#define DMA_INTR_ENA_ERE	0x00004000	/* Early Receive */
+#define DMA_INTR_ENA_FBE	0x00002000	/* Fatal Bus Error */
+#define DMA_INTR_ENA_ETE	0x00000400	/* Early Transmit */
+#define DMA_INTR_ENA_RWE	0x00000200	/* Receive Watchdog */
+#define DMA_INTR_ENA_RSE	0x00000100	/* Receive Stopped */
+#define DMA_INTR_ENA_RUE	0x00000080	/* Receive Buffer Unavailable */
+#define DMA_INTR_ENA_RIE	0x00000040	/* Receive Interrupt */
+#define DMA_INTR_ENA_UNE	0x00000020	/* Tx Underflow */
+#define DMA_INTR_ENA_OVE	0x00000010	/* Receive Overflow */
+#define DMA_INTR_ENA_TJE	0x00000008	/* Transmit Jabber */
+#define DMA_INTR_ENA_TUE	0x00000004	/* Transmit Buffer Unavail */
+#define DMA_INTR_ENA_TSE	0x00000002	/* Transmit Stopped */
+#define DMA_INTR_ENA_TIE	0x00000001	/* Transmit Interrupt */
+
+#define DMA_INTR_NORMAL		(DMA_INTR_ENA_NIE | DMA_INTR_ENA_RIE | \
+				 DMA_INTR_ENA_TUE)
+
+#define DMA_INTR_ABNORMAL	(DMA_INTR_ENA_AIE | DMA_INTR_ENA_FBE | \
+				 DMA_INTR_ENA_RWE | DMA_INTR_ENA_RSE | \
+				 DMA_INTR_ENA_RUE | DMA_INTR_ENA_UNE | \
+				 DMA_INTR_ENA_OVE | DMA_INTR_ENA_TJE | \
+				 DMA_INTR_ENA_TSE)
+
+/* DMA default interrupt mask */
+#define DMA_INTR_DEFAULT_MASK	(DMA_INTR_NORMAL | DMA_INTR_ABNORMAL)
+
+/* DMA Status register defines */
+#define DMA_STATUS_GMI		0x08000000	/* MMC interrupt */
+#define DMA_STATUS_GLI		0x04000000	/* GMAC Line interface int */
+#define DMA_STATUS_EB_MASK	0x00380000	/* Error Bits Mask */
+#define DMA_STATUS_EB_TX_ABORT	0x00080000	/* Error Bits - TX Abort */
+#define DMA_STATUS_EB_RX_ABORT	0x00100000	/* Error Bits - RX Abort */
+#define DMA_STATUS_TS_MASK	0x00700000	/* Transmit Process State */
+#define DMA_STATUS_TS_SHIFT	20
+#define DMA_STATUS_RS_MASK	0x000e0000	/* Receive Process State */
+#define DMA_STATUS_RS_SHIFT	17
+#define DMA_STATUS_NIS		0x00010000	/* Normal Interrupt Summary */
+#define DMA_STATUS_AIS		0x00008000	/* Abnormal Interrupt Summary */
+#define DMA_STATUS_ERI		0x00004000	/* Early Receive Interrupt */
+#define DMA_STATUS_FBI		0x00002000	/* Fatal Bus Error Interrupt */
+#define DMA_STATUS_ETI		0x00000400	/* Early Transmit Interrupt */
+#define DMA_STATUS_RWT		0x00000200	/* Receive Watchdog Timeout */
+#define DMA_STATUS_RPS		0x00000100	/* Receive Process Stopped */
+#define DMA_STATUS_RU		0x00000080	/* Receive Buffer Unavailable */
+#define DMA_STATUS_RI		0x00000040	/* Receive Interrupt */
+#define DMA_STATUS_UNF		0x00000020	/* Transmit Underflow */
+#define DMA_STATUS_OVF		0x00000010	/* Receive Overflow */
+#define DMA_STATUS_TJT		0x00000008	/* Transmit Jabber Timeout */
+#define DMA_STATUS_TU		0x00000004	/* Transmit Buffer Unavail */
+#define DMA_STATUS_TPS		0x00000002	/* Transmit Process Stopped */
+#define DMA_STATUS_TI		0x00000001	/* Transmit Interrupt */
+
+/* Common MAC defines */
+#define MAC_ENABLE_TX		0x00000008	/* Transmitter Enable */
+#define MAC_ENABLE_RX		0x00000004	/* Receiver Enable */
+
+/* XGMAC Operation Mode Register */
+#define XGMAC_OMR_TSF		0x00200000	/* TX FIFO Store and Forward */
+#define XGMAC_OMR_FTF		0x00100000	/* Flush Transmit FIFO */
+#define XGMAC_OMR_TTC		0x00020000	/* Transmit Threshhold Ctrl */
+#define XGMAC_OMR_TTC_MASK	0x00030000
+#define XGMAC_OMR_RFD		0x00006000	/* FC Deactivation Threshhold */
+#define XGMAC_OMR_RFD_MASK	0x00007000	/* FC Deact Threshhold MASK */
+#define XGMAC_OMR_RFA		0x00000600	/* FC Activation Threshhold */
+#define XGMAC_OMR_RFA_MASK	0x00000E00	/* FC Act Threshhold MASK */
+#define XGMAC_OMR_EFC		0x00000100	/* Enable Hardware FC */
+#define XGMAC_OMR_FEF		0x00000080	/* Forward Error Frames */
+#define XGMAC_OMR_DT		0x00000040	/* Drop TCP/IP csum Errors */
+#define XGMAC_OMR_RSF		0x00000020	/* RX FIFO Store and Forward */
+#define XGMAC_OMR_RTC		0x00000010	/* RX Threshhold Ctrl */
+#define XGMAC_OMR_RTC_MASK	0x00000018	/* RX Threshhold Ctrl MASK */
+
+/* XGMAC HW Features Register */
+#define DMA_HW_FEAT_TXCOESEL	0x00010000	/* TX Checksum offload */
+
+/* XGMAC Descriptor Defines */
+#define MAX_DESC_BUF_SZ		(SZ_8K - 8)
+
+#define RXDESC_EXT_STATUS	0x00000001
+#define RXDESC_CRC_ERR		0x00000002
+#define RXDESC_RX_ERR		0x00000008
+#define RXDESC_RX_WDOG		0x00000010
+#define RXDESC_FRAME_TYPE	0x00000020
+#define RXDESC_GIANT_FRAME	0x00000080
+#define RXDESC_LAST_SEG		0x00000100
+#define RXDESC_FIRST_SEG	0x00000200
+#define RXDESC_VLAN_FRAME	0x00000400
+#define RXDESC_OVERFLOW_ERR	0x00000800
+#define RXDESC_LENGTH_ERR	0x00001000
+#define RXDESC_SA_FILTER_FAIL	0x00002000
+#define RXDESC_DESCRIPTOR_ERR	0x00004000
+#define RXDESC_ERROR_SUMMARY	0x00008000
+#define RXDESC_FRAME_LEN_OFFSET	16
+#define RXDESC_FRAME_LEN_MASK	0x3fff0000
+#define RXDESC_DA_FILTER_FAIL	0x40000000
+
+#define RXDESC1_END_RING	0x00008000
+
+#define RXDESC_IP_PAYLOAD_MASK	0x00000003
+#define RXDESC_IP_PAYLOAD_UDP	0x00000001
+#define RXDESC_IP_PAYLOAD_TCP	0x00000002
+#define RXDESC_IP_PAYLOAD_ICMP	0x00000003
+#define RXDESC_IP_HEADER_ERR	0x00000008
+#define RXDESC_IP_PAYLOAD_ERR	0x00000010
+#define RXDESC_IPV4_PACKET	0x00000040
+#define RXDESC_IPV6_PACKET	0x00000080
+#define TXDESC_UNDERFLOW_ERR	0x00000001
+#define TXDESC_JABBER_TIMEOUT	0x00000002
+#define TXDESC_LOCAL_FAULT	0x00000004
+#define TXDESC_REMOTE_FAULT	0x00000008
+#define TXDESC_VLAN_FRAME	0x00000010
+#define TXDESC_FRAME_FLUSHED	0x00000020
+#define TXDESC_IP_HEADER_ERR	0x00000040
+#define TXDESC_PAYLOAD_CSUM_ERR	0x00000080
+#define TXDESC_ERROR_SUMMARY	0x00008000
+#define TXDESC_SA_CTRL_INSERT	0x00040000
+#define TXDESC_SA_CTRL_REPLACE	0x00080000
+#define TXDESC_2ND_ADDR_CHAINED	0x00100000
+#define TXDESC_END_RING		0x00200000
+#define TXDESC_CSUM_IP		0x00400000
+#define TXDESC_CSUM_IP_PAYLD	0x00800000
+#define TXDESC_CSUM_ALL		0x00C00000
+#define TXDESC_CRC_EN_REPLACE	0x01000000
+#define TXDESC_CRC_EN_APPEND	0x02000000
+#define TXDESC_DISABLE_PAD	0x04000000
+#define TXDESC_FIRST_SEG	0x10000000
+#define TXDESC_LAST_SEG		0x20000000
+#define TXDESC_INTERRUPT	0x40000000
+
+#define DESC_OWN		0x80000000
+#define DESC_BUFFER1_SZ_MASK	0x00001fff
+#define DESC_BUFFER2_SZ_MASK	0x1fff0000
+#define DESC_BUFFER2_SZ_OFFSET	16
+
+struct xgmac_dma_desc {
+	__le32 flags;
+	__le32 buf_size;
+	__le32 buf1_addr;		/* Buffer 1 Address Pointer */
+	__le32 buf2_addr;		/* Buffer 2 Address Pointer */
+	__le32 ext_status;
+	__le32 res[3];
+};
+
+struct xgmac_extra_stats {
+	/* Transmit errors */
+	unsigned long tx_jabber;
+	unsigned long tx_frame_flushed;
+	unsigned long tx_payload_error;
+	unsigned long tx_ip_header_error;
+	unsigned long tx_local_fault;
+	unsigned long tx_remote_fault;
+	/* Receive errors */
+	unsigned long rx_watchdog;
+	unsigned long da_rx_filter_fail;
+	unsigned long sa_rx_filter_fail;
+	unsigned long rx_missed_cntr;
+	unsigned long rx_overflow_cntr;
+	unsigned long rx_payload_error;
+	unsigned long rx_ip_header_error;
+	/* Tx/Rx IRQ errors */
+	unsigned long tx_undeflow_irq;
+	unsigned long tx_process_stopped_irq;
+	unsigned long tx_jabber_irq;
+	unsigned long rx_overflow_irq;
+	unsigned long rx_buf_unav_irq;
+	unsigned long rx_process_stopped_irq;
+	unsigned long rx_watchdog_irq;
+	unsigned long tx_early_irq;
+	unsigned long fatal_bus_error_irq;
+};
+
+struct xgmac_priv {
+	struct xgmac_dma_desc *dma_rx;
+	struct sk_buff **rx_skbuff;
+	unsigned int rx_tail;
+	unsigned int rx_head;
+
+	struct xgmac_dma_desc *dma_tx;
+	struct sk_buff **tx_skbuff;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+
+	void __iomem *base;
+	struct sk_buff_head rx_recycle;
+	unsigned int dma_buf_sz;
+	dma_addr_t dma_rx_phy;
+	dma_addr_t dma_tx_phy;
+
+	struct net_device *dev;
+	struct device *device;
+	struct napi_struct napi;
+
+	struct xgmac_extra_stats xstats;
+
+	int pmt_irq;
+	char rx_pause;
+	char tx_pause;
+	int wolopts;
+};
+
+/* XGMAC Configuration Settings */
+#define MAX_MTU			9000
+#define PAUSE_TIME		0x400
+
+#define DMA_RX_RING_SZ		256
+#define DMA_TX_RING_SZ		128
+/* minimum number of free TX descriptors required to wake up TX process */
+#define TX_THRESH		(DMA_TX_RING_SZ/4)
+
+/* DMA descriptor ring helpers */
+#define dma_ring_incr(n, s)	(((n) + 1) & ((s) - 1))
+#define dma_ring_space(h, t, s)	CIRC_SPACE(h, t, s)
+#define dma_ring_cnt(h, t, s)	CIRC_CNT(h, t, s)
+
+/* XGMAC Descriptor Access Helpers */
+static inline void desc_set_buf_len(struct xgmac_dma_desc *p, u32 buf_sz)
+{
+	if (buf_sz > MAX_DESC_BUF_SZ)
+		p->buf_size = cpu_to_le32(MAX_DESC_BUF_SZ |
+			(buf_sz - MAX_DESC_BUF_SZ) << DESC_BUFFER2_SZ_OFFSET);
+	else
+		p->buf_size = cpu_to_le32(buf_sz);
+}
+
+static inline int desc_get_buf_len(struct xgmac_dma_desc *p)
+{
+	u32 len = cpu_to_le32(p->flags);
+	return (len & DESC_BUFFER1_SZ_MASK) +
+		((len & DESC_BUFFER2_SZ_MASK) >> DESC_BUFFER2_SZ_OFFSET);
+}
+
+static inline void desc_init_rx_desc(struct xgmac_dma_desc *p, int ring_size,
+				     int buf_sz)
+{
+	struct xgmac_dma_desc *end = p + ring_size - 1;
+
+	memset(p, 0, sizeof(*p) * ring_size);
+
+	for (; p <= end; p++)
+		desc_set_buf_len(p, buf_sz);
+
+	end->buf_size |= cpu_to_le32(RXDESC1_END_RING);
+}
+
+static inline void desc_init_tx_desc(struct xgmac_dma_desc *p, u32 ring_size)
+{
+	memset(p, 0, sizeof(*p) * ring_size);
+	p[ring_size - 1].flags = cpu_to_le32(TXDESC_END_RING);
+}
+
+static inline int desc_get_owner(struct xgmac_dma_desc *p)
+{
+	return le32_to_cpu(p->flags) & DESC_OWN;
+}
+
+static inline void desc_set_rx_owner(struct xgmac_dma_desc *p)
+{
+	/* Clear all fields and set the owner */
+	p->flags = cpu_to_le32(DESC_OWN);
+}
+
+static inline void desc_set_tx_owner(struct xgmac_dma_desc *p, u32 flags)
+{
+	u32 tmpflags = le32_to_cpu(p->flags);
+	tmpflags &= TXDESC_END_RING;
+	tmpflags |= flags | DESC_OWN;
+	p->flags = cpu_to_le32(tmpflags);
+}
+
+static inline int desc_get_tx_ls(struct xgmac_dma_desc *p)
+{
+	return le32_to_cpu(p->flags) & TXDESC_LAST_SEG;
+}
+
+static inline u32 desc_get_buf_addr(struct xgmac_dma_desc *p)
+{
+	return le32_to_cpu(p->buf1_addr);
+}
+
+static inline void desc_set_buf_addr(struct xgmac_dma_desc *p,
+				     u32 paddr, int len)
+{
+	p->buf1_addr = cpu_to_le32(paddr);
+	if (len > MAX_DESC_BUF_SZ)
+		p->buf2_addr = cpu_to_le32(paddr + MAX_DESC_BUF_SZ);
+}
+
+static inline void desc_set_buf_addr_and_size(struct xgmac_dma_desc *p,
+					      u32 paddr, int len)
+{
+	desc_set_buf_len(p, len);
+	desc_set_buf_addr(p, paddr, len);
+}
+
+static inline int desc_get_rx_frame_len(struct xgmac_dma_desc *p)
+{
+	u32 data = le32_to_cpu(p->flags);
+	u32 len = (data & RXDESC_FRAME_LEN_MASK) >> RXDESC_FRAME_LEN_OFFSET;
+	if (data & RXDESC_FRAME_TYPE)
+		len -= ETH_FCS_LEN;
+
+	return len;
+}
+
+static void xgmac_dma_flush_tx_fifo(void __iomem *ioaddr)
+{
+	int timeout = 1000;
+	u32 reg = readl(ioaddr + XGMAC_OMR);
+	writel(reg | XGMAC_OMR_FTF, ioaddr + XGMAC_OMR);
+
+	while ((timeout-- > 0) && readl(ioaddr + XGMAC_OMR) & XGMAC_OMR_FTF)
+		udelay(1);
+}
+
+static int desc_get_tx_status(struct xgmac_priv *priv, struct xgmac_dma_desc *p)
+{
+	struct xgmac_extra_stats *x = &priv->xstats;
+	u32 status = le32_to_cpu(p->flags);
+
+	if (!(status & TXDESC_ERROR_SUMMARY))
+		return 0;
+
+	netdev_dbg(priv->dev, "tx desc error = 0x%08x\n", status);
+	if (status & TXDESC_JABBER_TIMEOUT)
+		x->tx_jabber++;
+	if (status & TXDESC_FRAME_FLUSHED)
+		x->tx_frame_flushed++;
+	if (status & TXDESC_UNDERFLOW_ERR)
+		xgmac_dma_flush_tx_fifo(priv->base);
+	if (status & TXDESC_IP_HEADER_ERR)
+		x->tx_ip_header_error++;
+	if (status & TXDESC_LOCAL_FAULT)
+		x->tx_local_fault++;
+	if (status & TXDESC_REMOTE_FAULT)
+		x->tx_remote_fault++;
+	if (status & TXDESC_PAYLOAD_CSUM_ERR)
+		x->tx_payload_error++;
+
+	return -1;
+}
+
+static int desc_get_rx_status(struct xgmac_priv *priv, struct xgmac_dma_desc *p)
+{
+	struct xgmac_extra_stats *x = &priv->xstats;
+	int ret = CHECKSUM_UNNECESSARY;
+	u32 status = le32_to_cpu(p->flags);
+	u32 ext_status = le32_to_cpu(p->ext_status);
+
+	if (status & RXDESC_DA_FILTER_FAIL) {
+		netdev_dbg(priv->dev, "XGMAC RX : Dest Address filter fail\n");
+		x->da_rx_filter_fail++;
+		return -1;
+	}
+
+	/* Check if packet has checksum already */
+	if ((status & RXDESC_FRAME_TYPE) && (status & RXDESC_EXT_STATUS) &&
+		!(ext_status & RXDESC_IP_PAYLOAD_MASK))
+		ret = CHECKSUM_NONE;
+
+	netdev_dbg(priv->dev, "rx status - frame type=%d, csum = %d, ext stat %08x\n",
+		   (status & RXDESC_FRAME_TYPE) ? 1 : 0, ret, ext_status);
+
+	if (!(status & RXDESC_ERROR_SUMMARY))
+		return ret;
+
+	/* Handle any errors */
+	if (status & (RXDESC_DESCRIPTOR_ERR | RXDESC_OVERFLOW_ERR |
+		RXDESC_GIANT_FRAME | RXDESC_LENGTH_ERR | RXDESC_CRC_ERR))
+		return -1;
+
+	if (status & RXDESC_EXT_STATUS) {
+		if (ext_status & RXDESC_IP_HEADER_ERR)
+			x->rx_ip_header_error++;
+		if (ext_status & RXDESC_IP_PAYLOAD_ERR)
+			x->rx_payload_error++;
+		netdev_dbg(priv->dev, "IP checksum error - stat %08x\n",
+			   ext_status);
+		return -1;
+	}
+
+	return ret;
+}
+
+static inline void xgmac_mac_enable(void __iomem *ioaddr)
+{
+	u32 value = readl(ioaddr + XGMAC_CONTROL);
+	value |= MAC_ENABLE_RX | MAC_ENABLE_TX;
+	writel(value, ioaddr + XGMAC_CONTROL);
+
+	value = readl(ioaddr + XGMAC_DMA_CONTROL);
+	value |= DMA_CONTROL_ST | DMA_CONTROL_SR;
+	writel(value, ioaddr + XGMAC_DMA_CONTROL);
+}
+
+static inline void xgmac_mac_disable(void __iomem *ioaddr)
+{
+	u32 value = readl(ioaddr + XGMAC_DMA_CONTROL);
+	value &= ~(DMA_CONTROL_ST | DMA_CONTROL_SR);
+	writel(value, ioaddr + XGMAC_DMA_CONTROL);
+
+	value = readl(ioaddr + XGMAC_CONTROL);
+	value &= ~(MAC_ENABLE_TX | MAC_ENABLE_RX);
+	writel(value, ioaddr + XGMAC_CONTROL);
+}
+
+static void xgmac_set_mac_addr(void __iomem *ioaddr, unsigned char *addr,
+			       int num)
+{
+	u32 data;
+
+	data = (addr[5] << 8) | addr[4] | (num ? XGMAC_ADDR_AE : 0);
+	writel(data, ioaddr + XGMAC_ADDR_HIGH(num));
+	data = (addr[3] << 24) | (addr[2] << 16) | (addr[1] << 8) | addr[0];
+	writel(data, ioaddr + XGMAC_ADDR_LOW(num));
+}
+
+static void xgmac_get_mac_addr(void __iomem *ioaddr, unsigned char *addr,
+			       int num)
+{
+	u32 hi_addr, lo_addr;
+
+	/* Read the MAC address from the hardware */
+	hi_addr = readl(ioaddr + XGMAC_ADDR_HIGH(num));
+	lo_addr = readl(ioaddr + XGMAC_ADDR_LOW(num));
+
+	/* Extract the MAC address from the high and low words */
+	addr[0] = lo_addr & 0xff;
+	addr[1] = (lo_addr >> 8) & 0xff;
+	addr[2] = (lo_addr >> 16) & 0xff;
+	addr[3] = (lo_addr >> 24) & 0xff;
+	addr[4] = hi_addr & 0xff;
+	addr[5] = (hi_addr >> 8) & 0xff;
+}
+
+static int xgmac_set_flow_ctrl(struct xgmac_priv *priv, int rx, int tx)
+{
+	u32 reg;
+	unsigned int flow = 0;
+
+	priv->rx_pause = rx;
+	priv->tx_pause = tx;
+
+	if (rx || tx) {
+		if (rx)
+			flow |= XGMAC_FLOW_CTRL_RFE;
+		if (tx)
+			flow |= XGMAC_FLOW_CTRL_TFE;
+
+		flow |= XGMAC_FLOW_CTRL_PLT | XGMAC_FLOW_CTRL_UP;
+		flow |= (PAUSE_TIME << XGMAC_FLOW_CTRL_PT_SHIFT);
+
+		writel(flow, priv->base + XGMAC_FLOW_CTRL);
+
+		reg = readl(priv->base + XGMAC_OMR);
+		reg |= XGMAC_OMR_EFC;
+		writel(reg, priv->base + XGMAC_OMR);
+	} else {
+		writel(0, priv->base + XGMAC_FLOW_CTRL);
+
+		reg = readl(priv->base + XGMAC_OMR);
+		reg &= ~XGMAC_OMR_EFC;
+		writel(reg, priv->base + XGMAC_OMR);
+	}
+
+	return 0;
+}
+
+static void xgmac_rx_refill(struct xgmac_priv *priv)
+{
+	struct xgmac_dma_desc *p;
+	dma_addr_t paddr;
+
+	while (dma_ring_space(priv->rx_head, priv->rx_tail, DMA_RX_RING_SZ) > 1) {
+		int entry = priv->rx_head;
+		struct sk_buff *skb;
+
+		p = priv->dma_rx + entry;
+
+		if (priv->rx_skbuff[entry] != NULL)
+			continue;
+
+		skb = __skb_dequeue(&priv->rx_recycle);
+		if (skb == NULL)
+			skb = netdev_alloc_skb(priv->dev, priv->dma_buf_sz);
+		if (unlikely(skb == NULL))
+			break;
+
+		priv->rx_skbuff[entry] = skb;
+		paddr = dma_map_single(priv->device, skb->data,
+					 priv->dma_buf_sz, DMA_FROM_DEVICE);
+		desc_set_buf_addr(p, paddr, priv->dma_buf_sz);
+
+		netdev_dbg(priv->dev, "rx ring: head %d, tail %d\n",
+			priv->rx_head, priv->rx_tail);
+
+		priv->rx_head = dma_ring_incr(priv->rx_head, DMA_RX_RING_SZ);
+		/* Ensure descriptor is in memory before handing to h/w */
+		wmb();
+		desc_set_rx_owner(p);
+	}
+}
+
+/**
+ * init_xgmac_dma_desc_rings - init the RX/TX descriptor rings
+ * @dev: net device structure
+ * Description:  this function initializes the DMA RX/TX descriptors
+ * and allocates the socket buffers.
+ */
+static int xgmac_dma_desc_rings_init(struct net_device *dev)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+	unsigned int bfsize;
+
+	/* Set the Buffer size according to the MTU;
+	 * indeed, in case of jumbo we need to bump-up the buffer sizes.
+	 */
+	bfsize = ALIGN(dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN + 64,
+		       64);
+
+	netdev_dbg(priv->dev, "mtu [%d] bfsize [%d]\n", dev->mtu, bfsize);
+
+	priv->rx_skbuff = kzalloc(sizeof(struct sk_buff *) * DMA_RX_RING_SZ,
+				  GFP_KERNEL);
+	if (!priv->rx_skbuff)
+		return -ENOMEM;
+
+	priv->dma_rx = dma_alloc_coherent(priv->device,
+					  DMA_RX_RING_SZ *
+					  sizeof(struct xgmac_dma_desc),
+					  &priv->dma_rx_phy,
+					  GFP_KERNEL);
+	if (!priv->dma_rx)
+		goto err_dma_rx;
+
+	priv->tx_skbuff = kzalloc(sizeof(struct sk_buff *) * DMA_TX_RING_SZ,
+				  GFP_KERNEL);
+	if (!priv->tx_skbuff)
+		goto err_tx_skb;
+
+	priv->dma_tx = dma_alloc_coherent(priv->device,
+					  DMA_TX_RING_SZ *
+					  sizeof(struct xgmac_dma_desc),
+					  &priv->dma_tx_phy,
+					  GFP_KERNEL);
+	if (!priv->dma_tx)
+		goto err_dma_tx;
+
+	netdev_dbg(priv->dev, "DMA desc rings: virt addr (Rx %p, "
+	    "Tx %p)\n\tDMA phy addr (Rx 0x%08x, Tx 0x%08x)\n",
+	    priv->dma_rx, priv->dma_tx,
+	    (unsigned int)priv->dma_rx_phy, (unsigned int)priv->dma_tx_phy);
+
+	priv->rx_tail = 0;
+	priv->rx_head = 0;
+	priv->dma_buf_sz = bfsize;
+	desc_init_rx_desc(priv->dma_rx, DMA_RX_RING_SZ, priv->dma_buf_sz);
+	xgmac_rx_refill(priv);
+
+	priv->tx_tail = 0;
+	priv->tx_head = 0;
+	desc_init_tx_desc(priv->dma_tx, DMA_TX_RING_SZ);
+
+	/* The base address of the RX/TX descriptor lists must be written into
+	 * DMA CSR3 and CSR4, respectively. */
+	writel(priv->dma_tx_phy, priv->base + XGMAC_DMA_TX_BASE_ADDR);
+	writel(priv->dma_rx_phy, priv->base + XGMAC_DMA_RX_BASE_ADDR);
+
+	return 0;
+
+err_dma_tx:
+	kfree(priv->tx_skbuff);
+err_tx_skb:
+	dma_free_coherent(priv->device,
+			  DMA_RX_RING_SZ * sizeof(struct xgmac_dma_desc),
+			  priv->dma_rx, priv->dma_rx_phy);
+err_dma_rx:
+	kfree(priv->rx_skbuff);
+	return -ENOMEM;
+}
+
+static void xgmac_free_rx_skbufs(struct xgmac_priv *priv)
+{
+	int i;
+	struct xgmac_dma_desc *p;
+
+	for (i = 0; i < DMA_RX_RING_SZ; i++) {
+		if (priv->rx_skbuff[i] == NULL)
+			continue;
+
+		p = priv->dma_rx + i;
+		dma_unmap_single(priv->device, desc_get_buf_addr(p),
+				 priv->dma_buf_sz, DMA_FROM_DEVICE);
+		dev_kfree_skb_any(priv->rx_skbuff[i]);
+		priv->rx_skbuff[i] = NULL;
+	}
+}
+
+static void xgmac_free_tx_skbufs(struct xgmac_priv *priv)
+{
+	int i, f;
+	struct xgmac_dma_desc *p;
+
+	for (i = 0; i < DMA_TX_RING_SZ; i++) {
+		if (priv->tx_skbuff[i] == NULL)
+			continue;
+
+		p = priv->dma_tx + i;
+		dma_unmap_single(priv->device, desc_get_buf_addr(p),
+				 desc_get_buf_len(p), DMA_TO_DEVICE);
+
+		for (f = 0; f < skb_shinfo(priv->tx_skbuff[i])->nr_frags; f++) {
+			p = priv->dma_tx + i++;
+			dma_unmap_page(priv->device, desc_get_buf_addr(p),
+				       desc_get_buf_len(p), DMA_TO_DEVICE);
+		}
+
+		dev_kfree_skb_any(priv->tx_skbuff[i]);
+		priv->tx_skbuff[i] = NULL;
+	}
+}
+
+static void xgmac_free_dma_desc_rings(struct xgmac_priv *priv)
+{
+	/* Release the DMA TX/RX socket buffers */
+	xgmac_free_rx_skbufs(priv);
+	xgmac_free_tx_skbufs(priv);
+
+	/* Free the consistent memory allocated for descriptor rings */
+	dma_free_coherent(priv->device,
+			  DMA_TX_RING_SZ * sizeof(struct xgmac_dma_desc),
+			  priv->dma_tx, priv->dma_tx_phy);
+	priv->dma_tx = NULL;
+	dma_free_coherent(priv->device,
+			  DMA_RX_RING_SZ * sizeof(struct xgmac_dma_desc),
+			  priv->dma_rx, priv->dma_rx_phy);
+	priv->dma_rx = NULL;
+
+	kfree(priv->rx_skbuff);
+	priv->rx_skbuff = NULL;
+	kfree(priv->tx_skbuff);
+	priv->tx_skbuff = NULL;
+}
+
+/**
+ * xgmac_tx:
+ * @priv: private driver structure
+ * Description: it reclaims resources after transmission completes.
+ */
+static void xgmac_tx_complete(struct xgmac_priv *priv)
+{
+	int i;
+	void __iomem *ioaddr = priv->base;
+
+	writel(DMA_STATUS_TU | DMA_STATUS_NIS, ioaddr + XGMAC_DMA_STATUS);
+
+	while (dma_ring_cnt(priv->tx_head, priv->tx_tail, DMA_TX_RING_SZ)) {
+		unsigned int entry = priv->tx_tail;
+		struct sk_buff *skb = priv->tx_skbuff[entry];
+		struct xgmac_dma_desc *p = priv->dma_tx + entry;
+
+		/* Check if the descriptor is owned by the DMA. */
+		if (desc_get_owner(p))
+			break;
+
+		/* Verify tx error by looking at the last segment */
+		if (desc_get_tx_ls(p))
+			desc_get_tx_status(priv, p);
+
+		netdev_dbg(priv->dev, "tx ring: curr %d, dirty %d\n",
+			priv->tx_head, priv->tx_tail);
+
+		dma_unmap_single(priv->device, desc_get_buf_addr(p),
+				 desc_get_buf_len(p), DMA_TO_DEVICE);
+
+		priv->tx_skbuff[entry] = NULL;
+		priv->tx_tail = dma_ring_incr(entry, DMA_TX_RING_SZ);
+
+		if (!skb) {
+			continue;
+		}
+
+		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+			entry = priv->tx_tail = dma_ring_incr(priv->tx_tail,
+							      DMA_TX_RING_SZ);
+			p = priv->dma_tx + priv->tx_tail;
+
+			dma_unmap_page(priv->device, desc_get_buf_addr(p),
+				       desc_get_buf_len(p), DMA_TO_DEVICE);
+		}
+
+		/*
+		 * If there's room in the queue (limit it to size)
+		 * we add this skb back into the pool,
+		 * if it's the right size.
+		 */
+		if ((skb_queue_len(&priv->rx_recycle) <
+			DMA_RX_RING_SZ) &&
+			skb_recycle_check(skb, priv->dma_buf_sz))
+			__skb_queue_head(&priv->rx_recycle, skb);
+		else
+			dev_kfree_skb(skb);
+	}
+
+	if (dma_ring_space(priv->tx_head, priv->tx_tail, DMA_TX_RING_SZ) >
+	    TX_THRESH)
+		netif_wake_queue(priv->dev);
+}
+
+/**
+ * xgmac_tx_err:
+ * @priv: pointer to the private device structure
+ * Description: it cleans the descriptors and restarts the transmission
+ * in case of errors.
+ */
+static void xgmac_tx_err(struct xgmac_priv *priv)
+{
+	u32 reg, value, inten;
+
+	netif_stop_queue(priv->dev);
+
+	inten = readl(priv->base + XGMAC_DMA_INTR_ENA);
+	writel(0, priv->base + XGMAC_DMA_INTR_ENA);
+
+	reg = readl(priv->base + XGMAC_DMA_CONTROL);
+	writel(reg & ~DMA_CONTROL_ST, priv->base + XGMAC_DMA_CONTROL);
+	do {
+		value = readl(priv->base + XGMAC_DMA_STATUS) & 0x700000;
+	} while (value && (value != 0x600000));
+
+	xgmac_free_tx_skbufs(priv);
+	desc_init_tx_desc(priv->dma_tx, DMA_TX_RING_SZ);
+	priv->tx_tail = 0;
+	priv->tx_head = 0;
+	writel(reg | DMA_CONTROL_ST, priv->base + XGMAC_DMA_CONTROL);
+
+	writel(DMA_STATUS_TU | DMA_STATUS_TPS | DMA_STATUS_NIS | DMA_STATUS_AIS,
+		priv->base + XGMAC_DMA_STATUS);
+	writel(inten, priv->base + XGMAC_DMA_INTR_ENA);
+
+	netif_wake_queue(priv->dev);
+}
+
+static int xgmac_hw_init(struct net_device *dev)
+{
+	u32 value, ctrl;
+	int limit;
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void __iomem *ioaddr = priv->base;
+
+	/* Save the ctrl register value */
+	ctrl = readl(ioaddr + XGMAC_CONTROL) & XGMAC_CONTROL_SPD_MASK;
+
+	/* SW reset */
+	value = DMA_BUS_MODE_SFT_RESET;
+	writel(value, ioaddr + XGMAC_DMA_BUS_MODE);
+	limit = 15000;
+	while (limit-- &&
+		(readl(ioaddr + XGMAC_DMA_BUS_MODE) & DMA_BUS_MODE_SFT_RESET))
+		cpu_relax();
+	if (limit < 0)
+		return -EBUSY;
+
+	value = (0x10 << DMA_BUS_MODE_PBL_SHIFT) |
+		(0x10 << DMA_BUS_MODE_RPBL_SHIFT) |
+		DMA_BUS_MODE_FB | DMA_BUS_MODE_ATDS | DMA_BUS_MODE_AAL;
+	writel(value, ioaddr + XGMAC_DMA_BUS_MODE);
+
+	/* Enable interrupts */
+	writel(DMA_INTR_DEFAULT_MASK, ioaddr + XGMAC_DMA_STATUS);
+	writel(DMA_INTR_DEFAULT_MASK, ioaddr + XGMAC_DMA_INTR_ENA);
+
+	/* XGMAC requires AXI bus init. This is a 'magic number' for now */
+	writel(0x000100E, ioaddr + XGMAC_DMA_AXI_BUS);
+
+	ctrl |= XGMAC_CONTROL_DDIC | XGMAC_CONTROL_JE | XGMAC_CONTROL_ACS |
+		XGMAC_CONTROL_CAR;
+	if (dev->features & NETIF_F_RXCSUM)
+		ctrl |= XGMAC_CONTROL_IPC;
+	writel(ctrl, ioaddr + XGMAC_CONTROL);
+
+	value = DMA_CONTROL_DFF;
+	writel(value, ioaddr + XGMAC_DMA_CONTROL);
+
+	/* Set the HW DMA mode and the COE */
+	writel(XGMAC_OMR_TSF | XGMAC_OMR_RSF | XGMAC_OMR_RFD | XGMAC_OMR_RFA,
+		ioaddr + XGMAC_OMR);
+
+	/* Reset the MMC counters */
+	writel(1, ioaddr + XGMAC_MMC_CTRL);
+	return 0;
+}
+
+/**
+ *  xgmac_open - open entry point of the driver
+ *  @dev : pointer to the device structure.
+ *  Description:
+ *  This function is the open entry point of the driver.
+ *  Return value:
+ *  0 on success and an appropriate (-)ve integer as defined in errno.h
+ *  file on failure.
+ */
+static int xgmac_open(struct net_device *dev)
+{
+	int ret;
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void __iomem *ioaddr = priv->base;
+
+	/* Check that the MAC address is valid.  If its not, refuse
+	 * to bring the device up. The user must specify an
+	 * address using the following linux command:
+	 *      ifconfig eth0 hw ether xx:xx:xx:xx:xx:xx  */
+	if (!is_valid_ether_addr(dev->dev_addr)) {
+		random_ether_addr(dev->dev_addr);
+		netdev_dbg(priv->dev, "generated random MAC address %pM\n",
+			dev->dev_addr);
+	}
+
+	skb_queue_head_init(&priv->rx_recycle);
+	memset(&priv->xstats, 0, sizeof(struct xgmac_extra_stats));
+
+	/* Initialize the XGMAC and descriptors */
+	xgmac_hw_init(dev);
+	xgmac_set_mac_addr(ioaddr, dev->dev_addr, 0);
+	xgmac_set_flow_ctrl(priv, priv->rx_pause, priv->tx_pause);
+
+	ret = xgmac_dma_desc_rings_init(dev);
+	if (ret < 0)
+		return ret;
+
+	/* Enable the MAC Rx/Tx */
+	xgmac_mac_enable(ioaddr);
+
+	napi_enable(&priv->napi);
+	netif_start_queue(dev);
+
+	enable_irq(dev->irq);
+
+	return 0;
+}
+
+/**
+ *  xgmac_release - close entry point of the driver
+ *  @dev : device pointer.
+ *  Description:
+ *  This is the stop entry point of the driver.
+ */
+static int xgmac_release(struct net_device *dev)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+
+	netif_stop_queue(dev);
+
+	disable_irq(dev->irq);
+	napi_disable(&priv->napi);
+	skb_queue_purge(&priv->rx_recycle);
+
+	/* Disable the MAC core */
+	xgmac_mac_disable(priv->base);
+
+	/* Release and free the Rx/Tx resources */
+	xgmac_free_dma_desc_rings(priv);
+
+	return 0;
+}
+
+/**
+ *  xgmac_xmit:
+ *  @skb : the socket buffer
+ *  @dev : device pointer
+ *  Description : Tx entry point of the driver.
+ */
+static netdev_tx_t xgmac_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+	unsigned int entry;
+	int i;
+	int nfrags = skb_shinfo(skb)->nr_frags;
+	struct xgmac_dma_desc *desc, *first;
+	unsigned int desc_flags;
+	unsigned int len;
+	dma_addr_t paddr;
+
+	if (dma_ring_space(priv->tx_head, priv->tx_tail, DMA_TX_RING_SZ) <
+	    (nfrags + 1)) {
+		writel(DMA_INTR_DEFAULT_MASK | DMA_INTR_ENA_TIE,
+			priv->base + XGMAC_DMA_INTR_ENA);
+		netif_stop_queue(dev);
+		return NETDEV_TX_BUSY;
+	}
+
+	desc_flags = (skb->ip_summed == CHECKSUM_PARTIAL) ?
+		TXDESC_CSUM_ALL : 0;
+	entry = priv->tx_head;
+	desc = priv->dma_tx + entry;
+	first = desc;
+
+	priv->tx_skbuff[entry] = skb;
+	len = skb_headlen(skb);
+	paddr = dma_map_single(priv->device, skb->data, len, DMA_TO_DEVICE);
+	desc_set_buf_addr_and_size(desc, paddr, len);
+
+	for (i = 0; i < nfrags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		len = frag->size;
+		entry = dma_ring_incr(entry, DMA_TX_RING_SZ);
+		desc = priv->dma_tx + entry;
+
+		paddr = dma_map_page(priv->device, frag->page.p,
+				frag->page_offset, len, DMA_TO_DEVICE);
+		priv->tx_skbuff[entry] = NULL;
+
+		desc_set_buf_addr_and_size(desc, paddr, len);
+		if (i < (nfrags - 1))
+			desc_set_tx_owner(desc, desc_flags);
+	}
+
+	/* Interrupt on completition only for the latest segment */
+	if (desc != first)
+		desc_set_tx_owner(desc, desc_flags |
+			TXDESC_LAST_SEG | TXDESC_INTERRUPT);
+	else
+		desc_flags |= TXDESC_LAST_SEG | TXDESC_INTERRUPT;
+
+	/* Set owner on first desc last to avoid race condition */
+	wmb();
+	desc_set_tx_owner(first, desc_flags | TXDESC_FIRST_SEG);
+
+	priv->tx_head = dma_ring_incr(entry, DMA_TX_RING_SZ);
+
+	writel(1, priv->base + XGMAC_DMA_TX_POLL);
+
+	return NETDEV_TX_OK;
+}
+
+static int xgmac_rx(struct xgmac_priv *priv, int limit)
+{
+	unsigned int entry;
+	unsigned int count = 0;
+	struct xgmac_dma_desc *p;
+
+	while (count < limit) {
+		int ip_checksum;
+		struct sk_buff *skb;
+		int frame_len;
+
+		writel(DMA_STATUS_RI | DMA_STATUS_NIS,
+		       priv->base + XGMAC_DMA_STATUS);
+
+		entry = priv->rx_tail;
+		p = priv->dma_rx + entry;
+		if (desc_get_owner(p))
+			break;
+
+		count++;
+		priv->rx_tail = dma_ring_incr(priv->rx_tail, DMA_RX_RING_SZ);
+
+		/* read the status of the incoming frame */
+		ip_checksum = desc_get_rx_status(priv, p);
+		if (ip_checksum < 0)
+			continue;
+
+		skb = priv->rx_skbuff[entry];
+		if (unlikely(!skb)) {
+			netdev_err(priv->dev, "Inconsistent Rx descriptor chain\n");
+			break;
+		}
+		priv->rx_skbuff[entry] = NULL;
+
+		frame_len = desc_get_rx_frame_len(p);
+		netdev_dbg(priv->dev, "RX frame size %d, COE status: %d\n",
+			frame_len, ip_checksum);
+
+		skb_put(skb, frame_len);
+		dma_unmap_single(priv->device, desc_get_buf_addr(p),
+				 frame_len, DMA_FROM_DEVICE);
+
+		skb->protocol = eth_type_trans(skb, priv->dev);
+		skb->ip_summed = ip_checksum;
+		if (ip_checksum == CHECKSUM_NONE)
+			netif_receive_skb(skb);
+		else
+			napi_gro_receive(&priv->napi, skb);
+	}
+
+	xgmac_rx_refill(priv);
+
+	writel(1, priv->base + XGMAC_DMA_RX_POLL);
+
+	return count;
+}
+
+/**
+ *  xgmac_poll - xgmac poll method (NAPI)
+ *  @napi : pointer to the napi structure.
+ *  @budget : maximum number of packets that the current CPU can receive from
+ *	      all interfaces.
+ *  Description :
+ *   This function implements the the reception process.
+ *   Also it runs the TX completion thread
+ */
+static int xgmac_poll(struct napi_struct *napi, int budget)
+{
+	struct xgmac_priv *priv = container_of(napi,
+				       struct xgmac_priv, napi);
+	int work_done = 0;
+
+	xgmac_tx_complete(priv);
+	work_done = xgmac_rx(priv, budget);
+
+	if (work_done < budget) {
+		napi_complete(napi);
+		writel(DMA_INTR_DEFAULT_MASK, priv->base + XGMAC_DMA_INTR_ENA);
+	}
+	return work_done;
+}
+
+/**
+ *  xgmac_tx_timeout
+ *  @dev : Pointer to net device structure
+ *  Description: this function is called when a packet transmission fails to
+ *   complete within a reasonable tmrate. The driver will mark the error in the
+ *   netdev structure and arrange for the device to be reset to a sane state
+ *   in order to transmit a new packet.
+ */
+static void xgmac_tx_timeout(struct net_device *dev)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+
+	/* Clear Tx resources and restart transmitting again */
+	xgmac_tx_err(priv);
+}
+
+/**
+ *  xgmac_set_rx_mode - entry point for multicast addressing
+ *  @dev : pointer to the device structure
+ *  Description:
+ *  This function is a driver entry point which gets called by the kernel
+ *  whenever multicast addresses must be enabled/disabled.
+ *  Return value:
+ *  void.
+ */
+static void xgmac_set_rx_mode(struct net_device *dev)
+{
+	int i;
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void __iomem *ioaddr = priv->base;
+	unsigned int value = 0;
+	u32 mc_filter[XGMAC_NUM_HASH];
+	int reg = 1;
+	struct netdev_hw_addr *ha;
+	bool use_hash = false;
+
+	netdev_dbg(priv->dev, "# mcasts %d, # unicast %d\n",
+		 netdev_mc_count(dev), netdev_uc_count(dev));
+
+	if (dev->flags & IFF_PROMISC) {
+		writel(XGMAC_FRAME_FILTER_PR, ioaddr + XGMAC_FRAME_FILTER);
+		return;
+	}
+
+	memset(mc_filter, 0, sizeof(mc_filter));
+
+	if (netdev_uc_count(dev) > XGMAC_MAX_FILTER_ADDR) {
+		use_hash = true;
+		value |= XGMAC_FRAME_FILTER_HUC | XGMAC_FRAME_FILTER_HPF;
+	}
+	netdev_for_each_uc_addr(ha, dev) {
+		if (use_hash) {
+			u32 bit_nr = ~ether_crc(ETH_ALEN, ha->addr) >> 23;
+
+			/* The most significant 4 bits determine the register to
+			 * use (H/L) while the other 5 bits determine the bit
+			 * within the register. */
+			mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
+		} else {
+			xgmac_set_mac_addr(ioaddr, ha->addr, reg);
+			reg++;
+		}
+	}
+
+	if (dev->flags & IFF_ALLMULTI) {
+		value |= XGMAC_FRAME_FILTER_PM;
+		goto out;
+	}
+
+	if ((netdev_mc_count(dev) + reg - 1) > XGMAC_MAX_FILTER_ADDR) {
+		use_hash = true;
+		value |= XGMAC_FRAME_FILTER_HMC | XGMAC_FRAME_FILTER_HPF;
+	}
+	netdev_for_each_mc_addr(ha, dev) {
+		if (use_hash) {
+			u32 bit_nr = ~ether_crc(ETH_ALEN, ha->addr) >> 23;
+
+			/* The most significant 4 bits determine the register to
+			 * use (H/L) while the other 5 bits determine the bit
+			 * within the register. */
+			mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
+		} else {
+			xgmac_set_mac_addr(ioaddr, ha->addr, reg);
+			reg++;
+		}
+	}
+
+out:
+	for (i = 0; i < XGMAC_NUM_HASH; i++)
+		writel(mc_filter[i], ioaddr + XGMAC_HASH(i));
+
+	writel(value, ioaddr + XGMAC_FRAME_FILTER);
+}
+
+/**
+ *  xgmac_change_mtu - entry point to change MTU size for the device.
+ *  @dev : device pointer.
+ *  @new_mtu : the new MTU size for the device.
+ *  Description: the Maximum Transfer Unit (MTU) is used by the network layer
+ *  to drive packet transmission. Ethernet has an MTU of 1500 octets
+ *  (ETH_DATA_LEN). This value can be changed with ifconfig.
+ *  Return value:
+ *  0 on success and an appropriate (-)ve integer as defined in errno.h
+ *  file on failure.
+ */
+static int xgmac_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+	int old_mtu;
+
+	if ((new_mtu < 46) || (new_mtu > MAX_MTU)) {
+		netdev_err(priv->dev, "invalid MTU, max MTU is: %d\n", MAX_MTU);
+		return -EINVAL;
+	}
+
+	old_mtu = dev->mtu;
+	dev->mtu = new_mtu;
+
+	/* return early if the buffer sizes will not change */
+	if (old_mtu <= ETH_DATA_LEN && new_mtu <= ETH_DATA_LEN)
+		return 0;
+	if (old_mtu == new_mtu)
+		return 0;
+
+	/* Stop everything, get ready to change the MTU */
+	if (!netif_running(dev))
+		return 0;
+
+	/* Bring the interface down and then back up */
+	xgmac_release(dev);
+	xgmac_open(dev);
+
+	return 0;
+}
+
+static irqreturn_t xgmac_pmt_interrupt(int irq, void *dev_id)
+{
+	u32 intr_status;
+	struct net_device *dev = (struct net_device *)dev_id;
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void __iomem *ioaddr = priv->base;
+
+	intr_status = readl(ioaddr + XGMAC_INT_STAT);
+	if (intr_status & XGMAC_INT_STAT_PMT) {
+		netdev_dbg(priv->dev, "received Magic frame\n");
+		/* clear the PMT bits 5 and 6 by reading the PMT */
+		readl(ioaddr + XGMAC_PMT);
+	}
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t xgmac_interrupt(int irq, void *dev_id)
+{
+	u32 intr_status;
+	bool tx_err = false;
+	struct net_device *dev = (struct net_device *)dev_id;
+	struct xgmac_priv *priv = netdev_priv(dev);
+	struct xgmac_extra_stats *x = &priv->xstats;
+
+	/* read the status register (CSR5) */
+	intr_status = readl(priv->base + XGMAC_DMA_STATUS);
+	intr_status &= readl(priv->base + XGMAC_DMA_INTR_ENA);
+	writel(intr_status, priv->base + XGMAC_DMA_STATUS);
+
+	/* It displays the DMA process states (CSR5 register) */
+	/* ABNORMAL interrupts */
+	if (unlikely(intr_status & DMA_STATUS_AIS)) {
+		if (intr_status & DMA_STATUS_UNF) {
+			netdev_err(priv->dev, "transmit underflow\n");
+			tx_err = true;
+			x->tx_undeflow_irq++;
+		}
+		if (intr_status & DMA_STATUS_TJT) {
+			netdev_err(priv->dev, "transmit jabber\n");
+			x->tx_jabber_irq++;
+		}
+		if (intr_status & DMA_STATUS_OVF) {
+			netdev_err(priv->dev, "recv overflow\n");
+			x->rx_overflow_irq++;
+		}
+		if (intr_status & DMA_STATUS_RU)
+			x->rx_buf_unav_irq++;
+		if (intr_status & DMA_STATUS_RPS) {
+			netdev_err(priv->dev, "receive process stopped\n");
+			x->rx_process_stopped_irq++;
+		}
+		if (intr_status & DMA_STATUS_RWT) {
+			netdev_err(priv->dev, "receive watchdog\n");
+			x->rx_watchdog_irq++;
+		}
+		if (intr_status & DMA_STATUS_ETI) {
+			netdev_err(priv->dev, "transmit early interrupt\n");
+			x->tx_early_irq++;
+		}
+		if (intr_status & DMA_STATUS_TPS) {
+			netdev_err(priv->dev, "transmit process stopped\n");
+			x->tx_process_stopped_irq++;
+			tx_err = true;
+		}
+		if (intr_status & DMA_STATUS_FBI) {
+			netdev_err(priv->dev, "fatal bus error\n");
+			x->fatal_bus_error_irq++;
+			tx_err = true;
+		}
+
+		if (tx_err)
+			xgmac_tx_err(priv);
+	}
+
+	/* TX/RX NORMAL interrupts */
+	if (intr_status & (DMA_STATUS_RI | DMA_STATUS_TU)) {
+		writel(DMA_INTR_ABNORMAL, priv->base + XGMAC_DMA_INTR_ENA);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/* Polling receive - used by NETCONSOLE and other diagnostic tools
+ * to allow network I/O with interrupts disabled. */
+static void xgmac_poll_controller(struct net_device *dev)
+{
+	disable_irq(dev->irq);
+	xgmac_interrupt(dev->irq, dev);
+	enable_irq(dev->irq);
+}
+#endif
+
+struct rtnl_link_stats64 *
+xgmac_get_stats64(struct net_device *dev,
+		       struct rtnl_link_stats64 *storage)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void __iomem *base = priv->base;
+	u64 count;
+
+	storage->rx_packets = readl(base + XGMAC_MMC_RXFRAME_GB_LO);
+	storage->rx_packets |=
+		(u64)(readl(base + XGMAC_MMC_RXFRAME_GB_HI)) << 32;
+	storage->rx_bytes = readl(base + XGMAC_MMC_RXOCTET_G_LO);
+	storage->rx_bytes |= (u64)(readl(base + XGMAC_MMC_RXOCTET_G_HI)) << 32;
+
+	storage->multicast = readl(base + XGMAC_MMC_RXMCFRAME_G);
+	storage->rx_crc_errors = readl(base + XGMAC_MMC_RXCRCERR);
+	storage->rx_length_errors = readl(base + XGMAC_MMC_RXLENGTHERR);
+	storage->rx_missed_errors = readl(base + XGMAC_MMC_RXOVERFLOW);
+
+	storage->tx_packets = readl(base + XGMAC_MMC_TXFRAME_GB_LO);
+	storage->tx_packets |=
+		(u64)(readl(base + XGMAC_MMC_TXFRAME_GB_HI)) << 32;
+	storage->tx_bytes = readl(base + XGMAC_MMC_TXOCTET_G_LO);
+	storage->tx_bytes |= (u64)(readl(base + XGMAC_MMC_TXOCTET_G_HI)) << 32;
+
+	count = readl(base + XGMAC_MMC_TXFRAME_G_LO);
+	count |= (__u64)(readl(base + XGMAC_MMC_TXFRAME_G_HI)) << 32;
+	storage->tx_errors = storage->tx_packets - count;
+
+	storage->tx_fifo_errors = readl(base + XGMAC_MMC_TXUNDERFLOW);
+
+	return storage;
+}
+
+static int xgmac_set_mac_address(struct net_device *dev, void *p)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void __iomem *ioaddr = priv->base;
+	struct sockaddr *addr = p;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+
+	xgmac_set_mac_addr(ioaddr, dev->dev_addr, 0);
+
+	return 0;
+}
+
+static int xgmac_set_features(struct net_device *dev, u32 features)
+{
+	u32 ctrl;
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void __iomem *ioaddr = priv->base;
+	u32 changed = dev->features ^ features;
+
+	if (!(changed & NETIF_F_RXCSUM))
+		return 0;
+
+	ctrl = readl(ioaddr + XGMAC_CONTROL);
+	if (features & NETIF_F_RXCSUM)
+		ctrl |= XGMAC_CONTROL_IPC;
+	else
+		ctrl &= ~XGMAC_CONTROL_IPC;
+	writel(ctrl, ioaddr + XGMAC_CONTROL);
+
+	return 0;
+}
+
+static const struct net_device_ops xgmac_netdev_ops = {
+	.ndo_open = xgmac_open,
+	.ndo_start_xmit = xgmac_xmit,
+	.ndo_stop = xgmac_release,
+	.ndo_change_mtu = xgmac_change_mtu,
+	.ndo_set_rx_mode = xgmac_set_rx_mode,
+	.ndo_tx_timeout = xgmac_tx_timeout,
+	.ndo_get_stats64 = xgmac_get_stats64,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_poll_controller = xgmac_poll_controller,
+#endif
+	.ndo_set_mac_address = xgmac_set_mac_address,
+	.ndo_set_features = xgmac_set_features,
+};
+
+static int xgmac_ethtool_getsettings(struct net_device *dev,
+					  struct ethtool_cmd *cmd)
+{
+	cmd->autoneg = 0;
+	cmd->duplex = DUPLEX_FULL;
+	ethtool_cmd_speed_set(cmd, 10000);
+	cmd->supported = SUPPORTED_10000baseT_Full;
+	cmd->advertising = 0;
+	cmd->transceiver = XCVR_INTERNAL;
+	return 0;
+}
+
+static void xgmac_get_pauseparam(struct net_device *netdev,
+				      struct ethtool_pauseparam *pause)
+{
+	struct xgmac_priv *priv = netdev_priv(netdev);
+
+	pause->rx_pause = priv->rx_pause;
+	pause->tx_pause = priv->tx_pause;
+}
+
+static int xgmac_set_pauseparam(struct net_device *netdev,
+				     struct ethtool_pauseparam *pause)
+{
+	struct xgmac_priv *priv = netdev_priv(netdev);
+	return xgmac_set_flow_ctrl(priv, pause->rx_pause, pause->tx_pause);
+}
+
+struct xgmac_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int stat_offset;
+	bool is_reg;
+};
+
+#define XGMAC_STAT(m)	\
+	{ #m, offsetof(struct xgmac_priv, xstats.m), false }
+#define XGMAC_HW_STAT(m, reg_offset)	\
+	{ #m, reg_offset, true }
+
+static const struct xgmac_stats xgmac_gstrings_stats[] = {
+	XGMAC_STAT(tx_frame_flushed),
+	XGMAC_STAT(tx_payload_error),
+	XGMAC_STAT(tx_ip_header_error),
+	XGMAC_STAT(tx_local_fault),
+	XGMAC_STAT(tx_remote_fault),
+	XGMAC_STAT(rx_watchdog),
+	XGMAC_STAT(da_rx_filter_fail),
+	XGMAC_STAT(sa_rx_filter_fail),
+	XGMAC_STAT(rx_missed_cntr),
+	XGMAC_STAT(rx_overflow_cntr),
+	XGMAC_STAT(tx_undeflow_irq),
+	XGMAC_STAT(tx_process_stopped_irq),
+	XGMAC_STAT(tx_jabber_irq),
+	XGMAC_STAT(rx_overflow_irq),
+	XGMAC_STAT(rx_buf_unav_irq),
+	XGMAC_STAT(rx_process_stopped_irq),
+	XGMAC_STAT(rx_watchdog_irq),
+	XGMAC_STAT(rx_payload_error),
+	XGMAC_STAT(rx_ip_header_error),
+	XGMAC_STAT(tx_early_irq),
+	XGMAC_STAT(fatal_bus_error_irq),
+	XGMAC_HW_STAT(tx_vlan, XGMAC_MMC_TXVLANFRAME),
+	XGMAC_HW_STAT(rx_vlan, XGMAC_MMC_RXVLANFRAME),
+	XGMAC_HW_STAT(tx_pause, XGMAC_MMC_TXPAUSEFRAME),
+	XGMAC_HW_STAT(rx_pause, XGMAC_MMC_RXPAUSEFRAME),
+};
+#define XGMAC_STATS_LEN ARRAY_SIZE(xgmac_gstrings_stats)
+
+static void xgmac_get_ethtool_stats(struct net_device *dev,
+					 struct ethtool_stats *dummy,
+					 u64 *data)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+	void *p = priv;
+	int i;
+
+	for (i = 0; i < XGMAC_STATS_LEN; i++) {
+		if (xgmac_gstrings_stats[i].is_reg)
+			*data++ = readl(priv->base +
+				xgmac_gstrings_stats[i].stat_offset);
+		else
+			*data++ = *(u32 *)(p +
+				xgmac_gstrings_stats[i].stat_offset);
+	}
+}
+
+static int xgmac_get_sset_count(struct net_device *netdev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return XGMAC_STATS_LEN;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void xgmac_get_strings(struct net_device *dev, u32 stringset,
+				   u8 *data)
+{
+	int i;
+	u8 *p = data;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for (i = 0; i < XGMAC_STATS_LEN; i++) {
+			memcpy(p, xgmac_gstrings_stats[i].stat_string,
+			       ETH_GSTRING_LEN);
+			p += ETH_GSTRING_LEN;
+		}
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+}
+
+static void xgmac_get_wol(struct net_device *dev,
+			       struct ethtool_wolinfo *wol)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+
+	if (device_can_wakeup(priv->device)) {
+		wol->supported = WAKE_MAGIC | WAKE_UCAST;
+		wol->wolopts = priv->wolopts;
+	}
+}
+
+static int xgmac_set_wol(struct net_device *dev,
+			      struct ethtool_wolinfo *wol)
+{
+	struct xgmac_priv *priv = netdev_priv(dev);
+	u32 support = WAKE_MAGIC | WAKE_UCAST;
+
+	if (!device_can_wakeup(priv->device))
+		return -EINVAL;
+
+	if (wol->wolopts & ~support)
+		return -EINVAL;
+
+	priv->wolopts = wol->wolopts;
+
+	if (wol->wolopts) {
+		device_set_wakeup_enable(priv->device, 1);
+		enable_irq_wake(dev->irq);
+	} else {
+		device_set_wakeup_enable(priv->device, 0);
+		disable_irq_wake(dev->irq);
+	}
+
+	return 0;
+}
+
+static struct ethtool_ops xgmac_ethtool_ops = {
+	.get_settings = xgmac_ethtool_getsettings,
+	.get_link = ethtool_op_get_link,
+	.get_pauseparam = xgmac_get_pauseparam,
+	.set_pauseparam = xgmac_set_pauseparam,
+	.get_ethtool_stats = xgmac_get_ethtool_stats,
+	.get_strings = xgmac_get_strings,
+	.get_wol = xgmac_get_wol,
+	.set_wol = xgmac_set_wol,
+	.get_sset_count = xgmac_get_sset_count,
+};
+
+/**
+ * xgmac_probe
+ * @pdev: platform device pointer
+ * Description: the driver is initialized through platform_device.
+ */
+static int xgmac_probe(struct platform_device *pdev)
+{
+	int ret = 0;
+	struct resource *res;
+	struct net_device *ndev = NULL;
+	struct xgmac_priv *priv = NULL;
+	u32 uid;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res)
+		return -ENODEV;
+
+	if (!request_mem_region(res->start, resource_size(res), pdev->name))
+		return -EBUSY;
+
+	ndev = alloc_etherdev(sizeof(struct xgmac_priv));
+	if (!ndev) {
+		ret = -ENOMEM;
+		goto err_alloc;
+	}
+
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+	priv = netdev_priv(ndev);
+	platform_set_drvdata(pdev, ndev);
+	ether_setup(ndev);
+	ndev->netdev_ops = &xgmac_netdev_ops;
+	SET_ETHTOOL_OPS(ndev, &xgmac_ethtool_ops);
+
+	priv->device = &pdev->dev;
+	priv->dev = ndev;
+	priv->rx_pause = 1;
+	priv->tx_pause = 1;
+
+	priv->base = ioremap(res->start, resource_size(res));
+	if (!priv->base) {
+		netdev_err(ndev, "ioremap failed\n");
+		ret = -ENOMEM;
+		goto err_io;
+	}
+
+	uid = readl(priv->base + XGMAC_VERSION);
+	netdev_info(ndev, "h/w version is 0x%x\n", uid);
+
+	ndev->irq = platform_get_irq(pdev, 0);
+	if (ndev->irq == -ENXIO) {
+		netdev_err(ndev, "No irq resource\n");
+		ret = ndev->irq;
+		goto err_irq;
+	}
+
+	ret = request_irq(ndev->irq, xgmac_interrupt, 0,
+			  dev_name(&pdev->dev), ndev);
+	if (ret < 0) {
+		netdev_err(ndev, "Could not request irq %d - ret %d)\n",
+			ndev->irq, ret);
+		goto err_irq;
+	}
+	disable_irq(ndev->irq);
+
+	priv->pmt_irq = platform_get_irq(pdev, 1);
+	if (priv->pmt_irq == -ENXIO) {
+		netdev_err(ndev, "No pmt irq resource\n");
+		ret = priv->pmt_irq;
+		goto err_pmt_irq;
+	}
+
+	ret = request_irq(priv->pmt_irq, xgmac_pmt_interrupt, 0,
+			  dev_name(&pdev->dev), ndev);
+	if (ret < 0) {
+		netdev_err(ndev, "Could not request irq %d - ret %d)\n",
+			priv->pmt_irq, ret);
+		goto err_pmt_irq;
+	}
+
+	device_set_wakeup_capable(&pdev->dev, 1);
+	if (device_can_wakeup(priv->device))
+		priv->wolopts = WAKE_MAGIC;	/* Magic Frame as default */
+
+	ndev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA;
+	if (readl(priv->base + XGMAC_DMA_HW_FEATURE) & DMA_HW_FEAT_TXCOESEL)
+		ndev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+				     NETIF_F_RXCSUM;
+	ndev->features |= ndev->hw_features;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+
+	/* Get the MAC address */
+	xgmac_get_mac_addr(priv->base, ndev->dev_addr, 0);
+	if (!is_valid_ether_addr(ndev->dev_addr))
+		netdev_warn(ndev, "MAC address %pM not valid",
+			 ndev->dev_addr);
+
+	netif_napi_add(ndev, &priv->napi, xgmac_poll, 64);
+	ret = register_netdev(ndev);
+	if (ret)
+		goto err_reg;
+
+	return 0;
+
+err_reg:
+	free_irq(priv->pmt_irq, ndev);
+err_pmt_irq:
+	free_irq(ndev->irq, ndev);
+err_irq:
+	iounmap(priv->base);
+err_io:
+	free_netdev(ndev);
+err_alloc:
+	release_mem_region(res->start, resource_size(res));
+	platform_set_drvdata(pdev, NULL);
+	return ret;
+}
+
+/**
+ * xgmac_dvr_remove
+ * @pdev: platform device pointer
+ * Description: this function resets the TX/RX processes, disables the MAC RX/TX
+ * changes the link status, releases the DMA descriptor rings,
+ * unregisters the MDIO bus and unmaps the allocated memory.
+ */
+static int xgmac_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct xgmac_priv *priv = netdev_priv(ndev);
+	struct resource *res;
+
+	xgmac_mac_disable(priv->base);
+
+	/* Free the IRQ lines */
+	free_irq(ndev->irq, ndev);
+	free_irq(priv->pmt_irq, ndev);
+
+	platform_set_drvdata(pdev, NULL);
+	unregister_netdev(ndev);
+
+	iounmap(priv->base);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	release_mem_region(res->start, resource_size(res));
+
+	free_netdev(ndev);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static void xgmac_pmt(void __iomem *ioaddr, unsigned long mode)
+{
+	unsigned int pmt = 0;
+
+	if (mode & WAKE_MAGIC)
+		pmt |= XGMAC_PMT_POWERDOWN | XGMAC_PMT_MAGIC_PKT;
+	if (mode & WAKE_UCAST)
+		pmt |= XGMAC_PMT_POWERDOWN | XGMAC_PMT_GLBL_UNICAST;
+
+	writel(pmt, ioaddr + XGMAC_PMT);
+}
+
+static int xgmac_suspend(struct device *dev)
+{
+	struct net_device *ndev = platform_get_drvdata(to_platform_device(dev));
+	struct xgmac_priv *priv = netdev_priv(ndev);
+	u32 value;
+
+	if (!ndev || !netif_running(ndev))
+		return 0;
+
+	netif_device_detach(ndev);
+	napi_disable(&priv->napi);
+	writel(0, priv->base + XGMAC_DMA_INTR_ENA);
+
+	if (device_may_wakeup(priv->device)) {
+		/* Stop TX/RX DMA Only */
+		value = readl(priv->base + XGMAC_DMA_CONTROL);
+		value &= ~(DMA_CONTROL_ST | DMA_CONTROL_SR);
+		writel(value, priv->base + XGMAC_DMA_CONTROL);
+
+		xgmac_pmt(priv->base, priv->wolopts);
+	} else
+		xgmac_mac_disable(priv->base);
+
+	return 0;
+}
+
+static int xgmac_resume(struct device *dev)
+{
+	struct net_device *ndev = platform_get_drvdata(to_platform_device(dev));
+	struct xgmac_priv *priv = netdev_priv(ndev);
+	void __iomem *ioaddr = priv->base;
+
+	if (!netif_running(ndev))
+		return 0;
+
+	xgmac_pmt(ioaddr, 0);
+
+	/* Enable the MAC and DMA */
+	xgmac_mac_enable(ioaddr);
+	writel(DMA_INTR_DEFAULT_MASK, ioaddr + XGMAC_DMA_STATUS);
+	writel(DMA_INTR_DEFAULT_MASK, ioaddr + XGMAC_DMA_INTR_ENA);
+
+	netif_device_attach(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static SIMPLE_DEV_PM_OPS(xgmac_pm_ops, xgmac_suspend, xgmac_resume);
+#define XGMAC_PM_OPS (&xgmac_pm_ops)
+#else
+#define XGMAC_PM_OPS NULL
+#endif /* CONFIG_PM_SLEEP */
+
+static const struct of_device_id xgmac_of_match[] = {
+	{ .compatible = "calxeda,hb-xgmac", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, xgmac_of_match);
+
+static struct platform_driver xgmac_driver = {
+	.driver = {
+		.name = "calxedaxgmac",
+		.of_match_table = xgmac_of_match,
+	},
+	.probe = xgmac_probe,
+	.remove = xgmac_remove,
+	.driver.pm = XGMAC_PM_OPS,
+};
+
+module_platform_driver(xgmac_driver);
+
+MODULE_AUTHOR("Calxeda, Inc.");
+MODULE_DESCRIPTION("Calxeda 10G XGMAC driver");
+MODULE_LICENSE("GPL v2");
-- 
1.7.5.4

^ permalink raw reply related

* Re: [RFC] iproute:  Support cross-compiling.
From: Ben Greear @ 2011-11-16  2:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20111115171722.02fa1c77@nehalam.linuxnetplumber.net>

On 11/15/2011 05:17 PM, Stephen Hemminger wrote:
> On Tue, 15 Nov 2011 17:04:42 -0800
> greearb@candelatech.com wrote:
>
>> From: Ben Greear<greearb@candelatech.com>
>>
>> This lets users use their own compiler instead of
>> hard-coding to use gcc.
>>
>> Also adds tests to disable some things that were not supported
>> in my ARM cross-compile toolchain.
>>
>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>
> Can't you do this by setting up the compile environment better?
> I would rather have the tools work with all features and handle
> errors from kernel rather than neutering it.

At the least, the parts that let you use your own CC could
be split out and applied?

I've no interest in trying to fix whatever is wrong with my
tool chain..just not worth the effort since I don't need
network namespaces and I don't need the arp daemon.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH v5 3/9] net: split netdev features to separate header
From: Ben Hutchings @ 2011-11-16  2:14 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, David S. Miller
In-Reply-To: <e2c98cf47788fb0e2b2272b78486aeca30dde5db.1321404954.git.mirq-linux@rere.qmqm.pl>

On Wed, 2011-11-16 at 02:29 +0100, Michał Mirosław wrote:
> Move features definitions to separate header so that linux/skbuff.h won't
> need to include linux/netdevice.h after netdev_features_t is introduced.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v5 2/9] net: ethtool: break association of ETH_FLAG_* with NETIF_F_*
From: Ben Hutchings @ 2011-11-16  2:06 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, David S. Miller
In-Reply-To: <768d41024675f5dd2ca20e77e2f04cc7a8aa0519.1321404954.git.mirq-linux@rere.qmqm.pl>

On Wed, 2011-11-16 at 02:29 +0100, Michał Mirosław wrote:
> This is the only place left where dev->features are directly
> exposed to userspace.
> 
> I know checkpatch.pl complains about __ethtool_{get,set}_flags(), but
> the code is easier to read this way.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox