Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH net-next] tun: support retrieving multiple packets in a single read with IFF_MULTI_READ
From: Herbert Xu @ 2014-12-22 12:09 UTC (permalink / raw)
  To: Alex Gartrell
  Cc: jasonwang, davem, netdev, linux-kernel, mst, herbert, kernel-team,
	agartrell
In-Reply-To: <1417752000-27171-1-git-send-email-agartrell@fb.com>

Alex Gartrell <agartrell@fb.com> wrote:
> This patch adds the IFF_MULTI_READ flag.  This has the following behavior.
> 
> 1) If a read is too short for a packet, a single stripped packet will be read
> 
> 2) If a read is long enough for multiple packets, as many *full* packets
> will be read as possible.  We will not return a stripped packet, so even if
> there are many, many packets, we may get a short read.
> 
> In casual performance testing with a simple test program that simply reads
> and counts packets, IFF_MULTI_READ conservatively yielded a 30% CPU win, as
> measured by top.  Load was being driven by a bunch of hpings running on a
> server on the same L2 network (single hop through a top-of-rack switch).
> 
> Signed-off-by: Alex Gartrell <agartrell@fb.com>

As tun already has a socket interface can we do this through
recvmmsg?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: REGRESSION in nfnetlink on 3.18.x (bisected)
From: Pablo Neira Ayuso @ 2014-12-22 11:56 UTC (permalink / raw)
  To: Andre Tomt; +Cc: netfilter-devel, netdev
In-Reply-To: <5496075F.3060204@tomt.net>

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]

On Sun, Dec 21, 2014 at 12:33:51AM +0100, Andre Tomt wrote:
> On at least Ubuntu 14.04 LTS and Ubuntu 14.10 "conntrack -E" has
> started failing with Linux 3.18.x. conntrack -L still works.
> 
> 14.04 and 14.10 ships conntrack-utils version 1.4.1, but 1.4.2 does
> not work either.
> 
> It fails with:
> ># conntrack -E
> >conntrack v1.4.2 (conntrack-tools): Can't open handler
> 
> strace shows:
> >bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
> >getsockname(3, {sa_family=AF_NETLINK, pid=14092, groups=00000000}, [12]) = 0
> >bind(3, {sa_family=AF_NETLINK, pid=14092, groups=00000007}, 12) = -1 EINVAL (Invalid argument)
> 
> Reverting 97840cb67ff5ac8add836684f011fd838518d698 - netfilter:
> nfnetlink: fix insufficient validation in nfnetlink_bind

Could you give a test to this patch? Thanks.

[-- Attachment #2: 0001-netlink-fix-wrong-subscription-bitmask-to-group-mapp.patch --]
[-- Type: text/x-diff, Size: 1827 bytes --]

>From f4f65150fd2129607a7bd25f007c258045237c8c Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 21 Dec 2014 21:48:36 +0100
Subject: [PATCH nf] netlink: fix wrong subscription bitmask to group mapping in
 binding callbacks

The subscription bitmask passed via struct sockaddr_nl is converted to
the group number when calling the netlink_bind() and netlink_unbind()
callbacks.

The conversion is however incorrect since bitmask (1 << 0) needs to be
mapped to group number 1. Note that you cannot specify the group number 0
(usually known as _NONE) from setsockopt() using NETLINK_ADD_MEMBERSHIP
since this is rejected through -EINVAL.

This problem became noticeable since 97840cb ("netfilter: nfnetlink:
fix insufficient validation in nfnetlink_bind") when binding to bitmask
(1 << 0) in ctnetlink.

Reported-by: Andre Tomt <andre@tomt.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netlink/af_netlink.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 074cf3e..cbcf73b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1420,7 +1420,7 @@ static void netlink_unbind(int group, long unsigned int groups,
 
 	for (undo = 0; undo < group; undo++)
 		if (test_bit(undo, &groups))
-			nlk->netlink_unbind(undo);
+			nlk->netlink_unbind(undo + 1);
 }
 
 static int netlink_bind(struct socket *sock, struct sockaddr *addr,
@@ -1458,7 +1458,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 		for (group = 0; group < nlk->ngroups; group++) {
 			if (!test_bit(group, &groups))
 				continue;
-			err = nlk->netlink_bind(group);
+			err = nlk->netlink_bind(group + 1);
 			if (!err)
 				continue;
 			netlink_unbind(group, groups, nlk);
-- 
1.7.10.4


^ permalink raw reply related

* RE: [PATCH net] r8152: drop the tx packet with invalid length
From: Hayes Wang @ 2014-12-22 11:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tom Herbert, David Miller, netdev@vger.kernel.org, nic_swsd,
	linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <1419012837.9773.85.camel@edumazet-glaptop2.roam.corp.google.com>

> -----Original Message-----
> From: Hayes Wang 
> Sent: Monday, December 22, 2014 10:23 AM
> To: 'Eric Dumazet'
> Cc: Tom Herbert; David Miller; netdev@vger.kernel.org; 
> nic_swsd; linux-kernel@vger.kernel.org; linux-usb@vger.kernel.org
> Subject: RE: [PATCH net] r8152: drop the tx packet with invalid length
> 
>  Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> > Sent: Saturday, December 20, 2014 2:14 AM
> [...]
> > Could you try following patch ?
> 
> Thank you. I would test it.

It works for me. Thanks.
 
Best Regards,
Hayes

^ permalink raw reply

* Re: OOPS in nf_ct_unlink_expect_report using Polycom RealPresence Mobile
From: zhuyj @ 2014-12-22 10:34 UTC (permalink / raw)
  To: Mike Galbraith, astx; +Cc: linux-kernel, netdev, zyjzyj2000
In-Reply-To: <1391174223.6395.3.camel@marge.simpson.net>

Please check the number of iptables rule. Maybe it results from the big 
number of iptables rules.

Best Regards!
Zhu Yanjun

On 01/31/2014 09:17 PM, Mike Galbraith wrote:
> (CC netdev)
>
> On Fri, 2014-01-31 at 12:05 +0100, astx wrote:
>> Using Polycom video conferencing software my homebrew linux NAT router
>> crashes with attached kernel oops message.
>> This error can be reproduced also using kernel 3.2.54. Kernel 2.6.35
>> seems to be stable.
>>
>> Disabling nf_nat_h323 and nf_conntrack_h323 avoids crash - but video
>> conferencing software is no more usable.
>>
>>
>> ===================================================================================
>>    BUG: unable to handle kernel paging request at 00100104
>> IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
>> *pdpt = 00000000359aa001 *pde = 0000000000000000
>> Oops: 0002 [#1] SMP
>> Modules linked in: nf_conntrack_netlink nfnetlink xt_mac xt_TCPMSS
>> ipt_MASQUERADE
>>    xt_pkttype xt_multiport xt_REDIRECT xt_nat iptable_mangle xt_LOG
>> xt_limit af_packet
>>    act_mirred cls_u32 sch_ingress sch_hfsc ifb xt_tcpudp ip6t_REJECT ipt_REJECT
>>    ip6table_raw iptable_raw xt_CT iptable_filter nf_nat_pptp nf_nat_proto_gre
>>    nf_conntrack_proto_udplite nf_conntrack_proto_dccp ip6table_mangle
>> iptable_nat
>>    nf_nat_ipv4 nf_nat_sip nf_nat_irc nf_nat_snmp_basic nf_conntrack_snmp
>>    nf_conntrack_broadcast nf_nat_h323 nf_nat_tftp nf_nat_ftp nf_nat
>> nf_conntrack_h323
>>    nf_conntrack_tftp nf_conntrack_proto_sctp nf_conntrack_sip nf_conntrack_irc
>>    nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_ftp nf_conntrack_ipv4
>>    nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables
>>    x_tables padlock_sha padlock_aes e_powersaver freq_table mperf via_cputemp
>>    hwmon_vid serio_raw pcspkr i2c_viapro ehci_pci fan thermal processor 8139too
>>    sg thermal_sys button shpchp 8139cp pci_hotplug mii via_agp ext4 crc16 jbd2
>>    pata_via sata_via libata sd_mod scsi_mod ohci_hcd uhci_hcd ehci_hcd
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.28-9500-smp_m #1
>> Hardware name:    /CN700-8237, BIOS 6.00 PG 08/30/2007
>> task: c07ce180 ti: f6408000 task.ti: c07c2000
>> EIP: 0060:[<f8214f07>] EFLAGS: 00210206 CPU: 0
>> EIP is at nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
>> EAX: 00100100 EBX: eb636bc0 ECX: 00000000 EDX: eb461540
>> ESI: c0804e00 EDI: eb461544 EBP: f6409f08 ESP: f6409eec
>>    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> CR0: 8005003b CR2: 00100104 CR3: 359d4000 CR4: 000006b0
>> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> DR6: ffff0ff0 DR7: 00000400
>> Stack:
>>    00000000 00200286 f6409f08 c0244bd8 eb636bc0 00100100 00000000 f6409f18
>>    f8215687 f598ede8 c0804e00 f6409f28 f8211c99 f598ede8 f598ee50 f6409f5c
>>    f8212e5e 00000003 00000000 00000000 00000004 eb461514 f598ede8 00000000
>> Call Trace:
>>    [<c0244bd8>] ? del_timer+0x48/0x70
>>    [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack]
>>    [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack]
>>    [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack]
>>    [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
>>    [<c024442d>] call_timer_fn+0x1d/0x80
>>    [<c024461e>] run_timer_softirq+0x18e/0x1a0
>>    [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
>>    [<c023e6f3>] __do_softirq+0xa3/0x170
>>    [<c023e650>] ? __local_bh_enable+0x70/0x70
>>    <IRQ>
>>    [<c023e587>] ? irq_exit+0x67/0xa0
>>    [<c0202af6>] ? do_IRQ+0x46/0xb0
>>    [<c027ad05>] ? clockevents_notify+0x35/0x110
>>    [<c066ac6c>] ? common_interrupt+0x2c/0x40
>>    [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0
>>    [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100
>>    [<c02085f8>] ? arch_cpu_idle+0x8/0x30
>>    [<c027314b>] ? cpu_idle_loop+0x4b/0x140
>>    [<c0273258>] ? cpu_startup_entry+0x18/0x20
>>    [<c066056d>] ? rest_init+0x5d/0x70
>>    [<c0813ac8>] ? start_kernel+0x2ec/0x2f2
>>    [<c081364f>] ? repair_env_string+0x5b/0x5b
>>    [<c0813269>] ? i386_start_kernel+0x33/0x35
>> Code: 8b 7b 0c 8b b6 98 00 00 00 85 c0 89 07 74 03 89 78 04 c7 43 0c 00
>>    02 20 00 83 ae ec 05 00 00 01 8b 03 8b 7b 04 85 c0 89 07 74 03 <89> 78
>>    04 8b 43 7c c7 03 00 01 10 00 c7 43 04 00 02 20 00 80 6c
>> EIP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
>> SS:ESP 0068:f6409eec
>> CR2: 0000000000100104
>> ---[ end trace 79fe2e6b81f54dee ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Rebooting in 300 seconds..
>> ===================================================================================
>>
>>
>> Polycom Version: 3.1-44477
>> running on device: Apple iPad Mini
>> using operating system: iOS Version: 7.0.4
>>
>>
>> Attached also my kernel config. Hopefully someone could help...
>>
>> BR, Toni
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH net] net: Fix stacked vlan offload features computation
From: Toshiaki Makita @ 2014-12-22 10:04 UTC (permalink / raw)
  To: David S . Miller; +Cc: Toshiaki Makita, Jesse Gross, netdev

When vlan tags are stacked, it is very likely that the outer tag is stored
in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
Currently netif_skb_features() first looks at skb->protocol even if there
is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
encapsulated by the inner vlan instead of the inner vlan protocol.
This allows GSO packets to be passed to HW and they end up being
corrupted.

Fixes: 58e998c6d239 ("offloading: Force software GSO for multiple vlan tags.")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 net/core/dev.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f411c28..a6afd70 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2570,11 +2570,14 @@ netdev_features_t netif_skb_features(struct sk_buff *skb)
 	if (gso_segs > dev->gso_max_segs || gso_segs < dev->gso_min_segs)
 		features &= ~NETIF_F_GSO_MASK;

-	if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD)) {
-		struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
-		protocol = veh->h_vlan_encapsulated_proto;
-	} else if (!vlan_tx_tag_present(skb)) {
-		return harmonize_features(skb, features);
+	if (!vlan_tx_tag_present(skb)) {
+		if (unlikely(protocol == htons(ETH_P_8021Q) ||
+			     protocol == htons(ETH_P_8021AD))) {
+			struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
+			protocol = veh->h_vlan_encapsulated_proto;
+		} else {
+			return harmonize_features(skb, features);
+		}
 	}

 	features = netdev_intersect_features(features,
-- 
1.8.1.2

^ permalink raw reply related

* Re: [ovs-dev] OVS + BPF, make sense?
From: Thomas Graf @ 2014-12-22  9:53 UTC (permalink / raw)
  To: Andy Zhou; +Cc: dev@openvswitch.com, netdev@vger.kernel.org
In-Reply-To: <CACzMAJLtTKM57tD=COrZ5KOrZJVPVcUab=t2+y=CEU1e5xHV8g@mail.gmail.com>

Thanks a lot for sharing these minutes.

On 12/19/14 at 06:49pm, Andy Zhou wrote:
> Possible use cases of BPF in OVS Linux kernel datapath
> ===========================================
>
> [...]
>
> 4. Using BPF to implement overall OVS kernel module functionality
> 
>    Alexei likes this approach the most. The potential benefits are:

This would be my favourite as well long term assuming that the
performance benefits we hope for can be proven. A logical evolutionary
process might be 2, 3 and then go for the full coverage.

A small but: We can't just remove the existing Netlink based action
data path as non-OVS users exist which rely on it. So this would need
to exist in parallel unless we can get all users on board to transition
over to this new architecture.

^ permalink raw reply

* [PATCH net] cxgb4vf: Fix ethtool get_settings for VF driver
From: Hariprasad Shenai @ 2014-12-22  9:44 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, kumaras, Hariprasad Shenai

Decode and display Port Type and Module Type for ethtool get_settings() call

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h     |    4 +
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    |  140 ++++++++++++++++++--
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |    2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c     |   54 ++++----
 4 files changed, 160 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index d00a751..6049f70 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -96,6 +96,9 @@ struct port_info {
 	s16 xact_addr_filt;		/* index of our MAC address filter */
 	u16 rss_size;			/* size of VI's RSS table slice */
 	u8 pidx;			/* index into adapter port[] */
+	s8 mdio_addr;
+	u8 port_type;			/* firmware port type */
+	u8 mod_type;			/* firmware module type */
 	u8 port_id;			/* physical port ID */
 	u8 nqsets;			/* # of "Queue Sets" */
 	u8 first_qset;			/* index of first "Queue Set" */
@@ -522,6 +525,7 @@ static inline struct adapter *netdev2adap(const struct net_device *dev)
  * is "contracted" to provide for the common code.
  */
 void t4vf_os_link_changed(struct adapter *, int, int);
+void t4vf_os_portmod_changed(struct adapter *, int);
 
 /*
  * SGE function prototype declarations.
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index aa74ec3..2215d43 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -44,6 +44,7 @@
 #include <linux/etherdevice.h>
 #include <linux/debugfs.h>
 #include <linux/ethtool.h>
+#include <linux/mdio.h>
 
 #include "t4vf_common.h"
 #include "t4vf_defs.h"
@@ -210,6 +211,38 @@ void t4vf_os_link_changed(struct adapter *adapter, int pidx, int link_ok)
 }
 
 /*
+ * THe port module type has changed on the indicated "port" (Virtual
+ * Interface).
+ */
+void t4vf_os_portmod_changed(struct adapter *adapter, int pidx)
+{
+	static const char * const mod_str[] = {
+		NULL, "LR", "SR", "ER", "passive DA", "active DA", "LRM"
+	};
+	const struct net_device *dev = adapter->port[pidx];
+	const struct port_info *pi = netdev_priv(dev);
+
+	if (pi->mod_type == FW_PORT_MOD_TYPE_NONE)
+		dev_info(adapter->pdev_dev, "%s: port module unplugged\n",
+			 dev->name);
+	else if (pi->mod_type < ARRAY_SIZE(mod_str))
+		dev_info(adapter->pdev_dev, "%s: %s port module inserted\n",
+			 dev->name, mod_str[pi->mod_type]);
+	else if (pi->mod_type == FW_PORT_MOD_TYPE_NOTSUPPORTED)
+		dev_info(adapter->pdev_dev, "%s: unsupported optical port "
+			 "module inserted\n", dev->name);
+	else if (pi->mod_type == FW_PORT_MOD_TYPE_UNKNOWN)
+		dev_info(adapter->pdev_dev, "%s: unknown port module inserted,"
+			 "forcing TWINAX\n", dev->name);
+	else if (pi->mod_type == FW_PORT_MOD_TYPE_ERROR)
+		dev_info(adapter->pdev_dev, "%s: transceiver module error\n",
+			 dev->name);
+	else
+		dev_info(adapter->pdev_dev, "%s: unknown module type %d "
+			 "inserted\n", dev->name, pi->mod_type);
+}
+
+/*
  * Net device operations.
  * ======================
  */
@@ -1193,24 +1226,103 @@ static void cxgb4vf_poll_controller(struct net_device *dev)
  * state of the port to which we're linked.
  */
 
-/*
- * Return current port link settings.
- */
-static int cxgb4vf_get_settings(struct net_device *dev,
-				struct ethtool_cmd *cmd)
-{
-	const struct port_info *pi = netdev_priv(dev);
+static unsigned int t4vf_from_fw_linkcaps(enum fw_port_type type,
+					  unsigned int caps)
+{
+	unsigned int v = 0;
+
+	if (type == FW_PORT_TYPE_BT_SGMII || type == FW_PORT_TYPE_BT_XFI ||
+	    type == FW_PORT_TYPE_BT_XAUI) {
+		v |= SUPPORTED_TP;
+		if (caps & FW_PORT_CAP_SPEED_100M)
+			v |= SUPPORTED_100baseT_Full;
+		if (caps & FW_PORT_CAP_SPEED_1G)
+			v |= SUPPORTED_1000baseT_Full;
+		if (caps & FW_PORT_CAP_SPEED_10G)
+			v |= SUPPORTED_10000baseT_Full;
+	} else if (type == FW_PORT_TYPE_KX4 || type == FW_PORT_TYPE_KX) {
+		v |= SUPPORTED_Backplane;
+		if (caps & FW_PORT_CAP_SPEED_1G)
+			v |= SUPPORTED_1000baseKX_Full;
+		if (caps & FW_PORT_CAP_SPEED_10G)
+			v |= SUPPORTED_10000baseKX4_Full;
+	} else if (type == FW_PORT_TYPE_KR)
+		v |= SUPPORTED_Backplane | SUPPORTED_10000baseKR_Full;
+	else if (type == FW_PORT_TYPE_BP_AP)
+		v |= SUPPORTED_Backplane | SUPPORTED_10000baseR_FEC |
+		     SUPPORTED_10000baseKR_Full | SUPPORTED_1000baseKX_Full;
+	else if (type == FW_PORT_TYPE_BP4_AP)
+		v |= SUPPORTED_Backplane | SUPPORTED_10000baseR_FEC |
+		     SUPPORTED_10000baseKR_Full | SUPPORTED_1000baseKX_Full |
+		     SUPPORTED_10000baseKX4_Full;
+	else if (type == FW_PORT_TYPE_FIBER_XFI ||
+		 type == FW_PORT_TYPE_FIBER_XAUI ||
+		 type == FW_PORT_TYPE_SFP ||
+		 type == FW_PORT_TYPE_QSFP_10G ||
+		 type == FW_PORT_TYPE_QSA) {
+		v |= SUPPORTED_FIBRE;
+		if (caps & FW_PORT_CAP_SPEED_1G)
+			v |= SUPPORTED_1000baseT_Full;
+		if (caps & FW_PORT_CAP_SPEED_10G)
+			v |= SUPPORTED_10000baseT_Full;
+	} else if (type == FW_PORT_TYPE_BP40_BA ||
+		   type == FW_PORT_TYPE_QSFP) {
+		v |= SUPPORTED_40000baseSR4_Full;
+		v |= SUPPORTED_FIBRE;
+	}
+
+	if (caps & FW_PORT_CAP_ANEG)
+		v |= SUPPORTED_Autoneg;
+	return v;
+}
+
+static int cxgb4vf_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+	const struct port_info *p = netdev_priv(dev);
+
+	if (p->port_type == FW_PORT_TYPE_BT_SGMII ||
+	    p->port_type == FW_PORT_TYPE_BT_XFI ||
+	    p->port_type == FW_PORT_TYPE_BT_XAUI)
+		cmd->port = PORT_TP;
+	else if (p->port_type == FW_PORT_TYPE_FIBER_XFI ||
+		 p->port_type == FW_PORT_TYPE_FIBER_XAUI)
+		cmd->port = PORT_FIBRE;
+	else if (p->port_type == FW_PORT_TYPE_SFP ||
+		 p->port_type == FW_PORT_TYPE_QSFP_10G ||
+		 p->port_type == FW_PORT_TYPE_QSA ||
+		 p->port_type == FW_PORT_TYPE_QSFP) {
+		if (p->mod_type == FW_PORT_MOD_TYPE_LR ||
+		    p->mod_type == FW_PORT_MOD_TYPE_SR ||
+		    p->mod_type == FW_PORT_MOD_TYPE_ER ||
+		    p->mod_type == FW_PORT_MOD_TYPE_LRM)
+			cmd->port = PORT_FIBRE;
+		else if (p->mod_type == FW_PORT_MOD_TYPE_TWINAX_PASSIVE ||
+			 p->mod_type == FW_PORT_MOD_TYPE_TWINAX_ACTIVE)
+			cmd->port = PORT_DA;
+		else
+			cmd->port = PORT_OTHER;
+	} else
+		cmd->port = PORT_OTHER;
 
-	cmd->supported = pi->link_cfg.supported;
-	cmd->advertising = pi->link_cfg.advertising;
+	if (p->mdio_addr >= 0) {
+		cmd->phy_address = p->mdio_addr;
+		cmd->transceiver = XCVR_EXTERNAL;
+		cmd->mdio_support = p->port_type == FW_PORT_TYPE_BT_SGMII ?
+			MDIO_SUPPORTS_C22 : MDIO_SUPPORTS_C45;
+	} else {
+		cmd->phy_address = 0;  /* not really, but no better option */
+		cmd->transceiver = XCVR_INTERNAL;
+		cmd->mdio_support = 0;
+	}
+
+	cmd->supported = t4vf_from_fw_linkcaps(p->port_type,
+					       p->link_cfg.supported);
+	cmd->advertising = t4vf_from_fw_linkcaps(p->port_type,
+					    p->link_cfg.advertising);
 	ethtool_cmd_speed_set(cmd,
-			      netif_carrier_ok(dev) ? pi->link_cfg.speed : -1);
+			      netif_carrier_ok(dev) ? p->link_cfg.speed : 0);
 	cmd->duplex = DUPLEX_FULL;
-
-	cmd->port = (cmd->supported & SUPPORTED_TP) ? PORT_TP : PORT_FIBRE;
-	cmd->phy_address = pi->port_id;
-	cmd->transceiver = XCVR_EXTERNAL;
-	cmd->autoneg = pi->link_cfg.autoneg;
+	cmd->autoneg = p->link_cfg.autoneg;
 	cmd->maxtxpkt = 0;
 	cmd->maxrxpkt = 0;
 	return 0;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
index 8d3237f..b9debb4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
@@ -230,7 +230,7 @@ struct adapter_params {
 
 static inline bool is_10g_port(const struct link_config *lc)
 {
-	return (lc->supported & SUPPORTED_10000baseT_Full) != 0;
+	return (lc->supported & FW_PORT_CAP_SPEED_10G) != 0;
 }
 
 static inline bool is_x_10g_port(const struct link_config *lc)
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
index 02e8833..21dc9a2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
@@ -245,6 +245,10 @@ static int hash_mac_addr(const u8 *addr)
 	return a & 0x3f;
 }
 
+#define ADVERT_MASK (FW_PORT_CAP_SPEED_100M | FW_PORT_CAP_SPEED_1G |\
+		     FW_PORT_CAP_SPEED_10G | FW_PORT_CAP_SPEED_40G | \
+		     FW_PORT_CAP_SPEED_100G | FW_PORT_CAP_ANEG)
+
 /**
  *	init_link_config - initialize a link's SW state
  *	@lc: structure holding the link state
@@ -259,8 +263,8 @@ static void init_link_config(struct link_config *lc, unsigned int caps)
 	lc->requested_speed = 0;
 	lc->speed = 0;
 	lc->requested_fc = lc->fc = PAUSE_RX | PAUSE_TX;
-	if (lc->supported & SUPPORTED_Autoneg) {
-		lc->advertising = lc->supported;
+	if (lc->supported & FW_PORT_CAP_ANEG) {
+		lc->advertising = lc->supported & ADVERT_MASK;
 		lc->autoneg = AUTONEG_ENABLE;
 		lc->requested_fc |= PAUSE_AUTONEG;
 	} else {
@@ -280,7 +284,6 @@ int t4vf_port_init(struct adapter *adapter, int pidx)
 	struct fw_vi_cmd vi_cmd, vi_rpl;
 	struct fw_port_cmd port_cmd, port_rpl;
 	int v;
-	u32 word;
 
 	/*
 	 * Execute a VI Read command to get our Virtual Interface information
@@ -319,19 +322,11 @@ int t4vf_port_init(struct adapter *adapter, int pidx)
 	if (v)
 		return v;
 
-	v = 0;
-	word = be16_to_cpu(port_rpl.u.info.pcap);
-	if (word & FW_PORT_CAP_SPEED_100M)
-		v |= SUPPORTED_100baseT_Full;
-	if (word & FW_PORT_CAP_SPEED_1G)
-		v |= SUPPORTED_1000baseT_Full;
-	if (word & FW_PORT_CAP_SPEED_10G)
-		v |= SUPPORTED_10000baseT_Full;
-	if (word & FW_PORT_CAP_SPEED_40G)
-		v |= SUPPORTED_40000baseSR4_Full;
-	if (word & FW_PORT_CAP_ANEG)
-		v |= SUPPORTED_Autoneg;
-	init_link_config(&pi->link_cfg, v);
+	v = be32_to_cpu(port_rpl.u.info.lstatus_to_modtype);
+	pi->port_type = FW_PORT_CMD_PTYPE_G(v);
+	pi->mod_type = FW_PORT_MOD_TYPE_NA;
+
+	init_link_config(&pi->link_cfg, be16_to_cpu(port_rpl.u.info.pcap));
 
 	return 0;
 }
@@ -1491,7 +1486,7 @@ int t4vf_handle_fw_rpl(struct adapter *adapter, const __be64 *rpl)
 		 */
 		const struct fw_port_cmd *port_cmd =
 			(const struct fw_port_cmd *)rpl;
-		u32 word;
+		u32 stat, mod;
 		int action, port_id, link_ok, speed, fc, pidx;
 
 		/*
@@ -1509,21 +1504,21 @@ int t4vf_handle_fw_rpl(struct adapter *adapter, const __be64 *rpl)
 		port_id = FW_PORT_CMD_PORTID_G(
 			be32_to_cpu(port_cmd->op_to_portid));
 
-		word = be32_to_cpu(port_cmd->u.info.lstatus_to_modtype);
-		link_ok = (word & FW_PORT_CMD_LSTATUS_F) != 0;
+		stat = be32_to_cpu(port_cmd->u.info.lstatus_to_modtype);
+		link_ok = (stat & FW_PORT_CMD_LSTATUS_F) != 0;
 		speed = 0;
 		fc = 0;
-		if (word & FW_PORT_CMD_RXPAUSE_F)
+		if (stat & FW_PORT_CMD_RXPAUSE_F)
 			fc |= PAUSE_RX;
-		if (word & FW_PORT_CMD_TXPAUSE_F)
+		if (stat & FW_PORT_CMD_TXPAUSE_F)
 			fc |= PAUSE_TX;
-		if (word & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_100M))
+		if (stat & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_100M))
 			speed = 100;
-		else if (word & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_1G))
+		else if (stat & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_1G))
 			speed = 1000;
-		else if (word & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_10G))
+		else if (stat & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_10G))
 			speed = 10000;
-		else if (word & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_40G))
+		else if (stat & FW_PORT_CMD_LSPEED_V(FW_PORT_CAP_SPEED_40G))
 			speed = 40000;
 
 		/*
@@ -1540,12 +1535,21 @@ int t4vf_handle_fw_rpl(struct adapter *adapter, const __be64 *rpl)
 				continue;
 
 			lc = &pi->link_cfg;
+
+			mod = FW_PORT_CMD_MODTYPE_G(stat);
+			if (mod != pi->mod_type) {
+				pi->mod_type = mod;
+				t4vf_os_portmod_changed(adapter, pidx);
+			}
+
 			if (link_ok != lc->link_ok || speed != lc->speed ||
 			    fc != lc->fc) {
 				/* something changed */
 				lc->link_ok = link_ok;
 				lc->speed = speed;
 				lc->fc = fc;
+				lc->supported =
+					be16_to_cpu(port_cmd->u.info.pcap);
 				t4vf_os_link_changed(adapter, pidx, link_ok);
 			}
 		}
-- 
1.7.1

^ permalink raw reply related

* caif: Fix napi poll list corruption
From: Herbert Xu @ 2014-12-22  9:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Vrabel, netdev, xen-devel, konrad.wilk, boris.ostrovsky,
	edumazet, David S. Miller
In-Reply-To: <5497D3D9.2070509@redhat.com>

On Mon, Dec 22, 2014 at 04:18:33PM +0800, Jason Wang wrote:
>
> btw, looks like at least caif_virtio has the same issue.

Good catch.

-- >8 --
The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
interrupt masking in NAPI) breaks caif.

It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty.  However, like some
other drivers caif tries to do a last-ditch check and if there is
more work it will call napi_schedule and then immediately process
some of this new work.  Should the entire budget be consumed while
processing such new work then we will violate the new caller
contract.

This patch fixes this by not touching any work when we reschedule
in caif.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index a5fefb9..b306210 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -257,7 +257,6 @@ static int cfv_rx_poll(struct napi_struct *napi, int quota)
 	struct vringh_kiov *riov = &cfv->ctx.riov;
 	unsigned int skb_len;
 
-again:
 	do {
 		skb = NULL;
 
@@ -322,7 +321,6 @@ exit:
 		    napi_schedule_prep(napi)) {
 			vringh_notify_disable_kern(cfv->vr_rx);
 			__napi_schedule(napi);
-			goto again;
 		}
 		break;

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: [PATCH] net: unisys: adding unisys virtnic driver
From: zhuyj @ 2014-12-22  8:32 UTC (permalink / raw)
  To: Erik Arfvidson, benjamin.romer, netdev, dzickus, davem,
	Bruce.Vessey, sparmaintainer, prarit
In-Reply-To: <1418842340-29894-1-git-send-email-earfvids@redhat.com>

Compared with veth, tun/tap, is there any difference about this virtnic?

Zhu Yanjun

On 12/18/2014 02:52 AM, Erik Arfvidson wrote:
> The purpose of this patch is to add Unisys virtual network driver
> into the network directory and also to start a discussion about
> the requirements needed.
>
> Signed-off-by: Erik Arfvidson <earfvids@redhat.com>
> ---
>   drivers/net/virtnic.c | 2475 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 2475 insertions(+)
>   create mode 100644 drivers/net/virtnic.c
>
> diff --git a/drivers/net/virtnic.c b/drivers/net/virtnic.c
> new file mode 100644
> index 0000000..0af48f3
> --- /dev/null
> +++ b/drivers/net/virtnic.c
> @@ -0,0 +1,2475 @@
> +/* virtnic.c
> + *
> + * Copyright © 2010 - 2014 UNISYS CORPORATION
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or (at
> + * your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + */
> +
> +#define EXPORT_SYMTAB
> +
> +#include <linux/kernel.h>
> +#ifdef CONFIG_MODVERSIONS
> +#include <config/modversions.h>
> +#endif
> +
> +#include "uniklog.h"
> +#include "diagnostics/appos_subsystems.h"
> +#include "uisutils.h"
> +#include "uisthread.h"
> +#include "uisqueue.h"
> +#include "visorchipset.h"
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/pci.h>
> +#include <linux/spinlock.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/netdevice.h>
> +#include <linux/etherdevice.h>
> +#include <linux/string.h>
> +#include <linux/tcp.h>
> +#include <linux/ip.h>
> +#include <linux/types.h>
> +#include <linux/uuid.h>
> +#include <linux/debugfs.h>
> +
> +#include "virtpci.h"
> +#include "version.h"
> +
> +/* this is shorter than using __FILE__ (full path name) in */
> +/* debug/info/error messages */
> +#define __MYFILE__ "virtnic.c"
> +
> +/* turn off collecting of debug statistics */
> +#define VIRTNIC_STATS 0
> +
> + /* MAX_BUF = 64 lines x 32 MAXVHBA x 80 characters
> + *         = 163840 bytes ~ 40 pages
> + */
> +#define MAX_BUF 163840
> +
> +/*
> + * uisnic                   virtnic
> + *         <---- xmit ---  virtnic_xmit(hard-start-xmit)
> + *         <-- rcvpost --  open, virtnic_rx
> + *	   <-- unpost ---  close
> + *	   <-- enb/dis --  open, close
> + *
> + * open & close can't run at the same time as each other or rcv/xmit, but
> + * virtnic_xmit and virtnic_rx could be running at the same time.
> + * and all messages being sent to uisnic MUST be sent so if the queue is
> + * full we have to retry, but we don't want to retry with a spinlock held.
> + */
> +
> +/*****************************************************/
> +/* Forward declarations                              */
> +/*****************************************************/
> +static int virtnic_probe(struct virtpci_dev *dev,
> +			 const struct pci_device_id *id);
> +static void virtnic_remove(struct virtpci_dev *dev);
> +static int virtnic_change_mtu(struct net_device *netdev, int new_mtu);
> +static int virtnic_close(struct net_device *netdev);
> +static struct net_device_stats *virtnic_get_stats(struct net_device *netdev);
> +static int virtnic_open(struct net_device *netdev);
> +static int virtnic_ioctl(struct net_device *netdev, struct ifreq *ifr,
> +			 int cmd);
> +static void virtnic_rx(struct uiscmdrsp *cmdrsp);
> +static int virtnic_xmit(struct sk_buff *skb, struct net_device *netdev);
> +static void virtnic_xmit_timeout(struct net_device *netdev);
> +static void virtnic_set_multi(struct net_device *netdev);
> +static int virtnic_serverdown(struct virtpci_dev *virtpcidev, u32 state);
> +static int virtnic_serverup(struct virtpci_dev *virtpcidev);
> +static void virtnic_serverdown_complete(struct work_struct *work);
> +static void virtnic_timeout_reset(struct work_struct *work);
> +static int process_incoming_rsps(void *);
> +static ssize_t info_debugfs_read(struct file *file, char __user *buf,
> +				 size_t len, loff_t *offset);
> +static ssize_t enable_ints_write(struct file *file,
> +				 const char __user *buffer,
> +				 size_t count, loff_t *ppos);
> +
> +/*****************************************************/
> +/* Globals                                           */
> +/*****************************************************/
> +
> +#define VIRTNIC_XMIT_TIMEOUT (5 * HZ)	/* Default timeout period in jiffies */
> +#define VIRTNIC_INFINITE_RESPONSE_WAIT 0
> +#define INTERRUPT_VECTOR_MASK 0x3F
> +
> +static struct workqueue_struct *virtnic_serverdown_workqueue;
> +static struct workqueue_struct *virtnic_timeout_reset_workqueue;
> +
> +static const struct pci_device_id virtnic_id_table[] = {
> +	{
> +	PCI_DEVICE(PCI_VENDOR_ID_UNISYS, PCI_DEVICE_ID_VIRTNIC)}, {
> +0},};
> +/* export virtnic_id_table */
> +MODULE_DEVICE_TABLE(pci, virtnic_id_table);
> +
> +static struct virtpci_driver virtnic_driver = {
> +	.name = "uisvirtnic",
> +	.version = VERSION,
> +	.vertag = NULL,
> +	.id_table = virtnic_id_table,
> +	.probe = virtnic_probe,
> +	.remove = virtnic_remove,
> +	.suspend = virtnic_serverdown,
> +	.resume = virtnic_serverup
> +};
> +
> +#define SEND_ENBDIS(ndev, state, cmdrsp, queue, insertlock, stats) { \
> +	DBGINF("sending rcv enb/dis netdev:%p state:%d\n", ndev, state); \
> +	cmdrsp->net.enbdis.enable = state; \
> +	cmdrsp->net.enbdis.context = ndev; \
> +	cmdrsp->net.type = NET_RCV_ENBDIS; \
> +	cmdrsp->cmdtype = CMD_NET_TYPE; \
> +	uisqueue_put_cmdrsp_with_lock_client(queue, cmdrsp, IOCHAN_TO_IOPART, \
> +					     (void *)insertlock, \
> +					     DONT_ISSUE_INTERRUPT, \
> +					     (uint64_t)NULL, \
> +					     OK_TO_WAIT, "vnic"); \
> +	stats.sent_enbdis++;\
> +}
> +
> +struct chanstat {
> +	unsigned long got_rcv;	/* count of NET_RCV received */
> +	unsigned long got_enbdisack;	/* count of NET_RCV_ENBDIS_ACK rcvd */
> +	unsigned long got_xmit_done;	/* count of NET_XMIT_DONE received */
> +	unsigned long xmit_fail;	/* count of NET_XMIT_DONE failures */
> +	unsigned long sent_enbdis;	/* count of NET_RCV_ENBDIS sent */
> +	unsigned long sent_promisc;	/* count of NET_RCV_PROMISC sent */
> +	unsigned long sent_post;	/* count of NET_RCV_POST sent */
> +	unsigned long sent_xmit;	/* count of NET_XMIT sent */
> +	unsigned long reject_count;	/* count of NET_XMIT rejected because */
> +	/* of BUSY/queue full */
> +	unsigned long extra_rcvbufs_sent;
> +#if VIRTNIC_STATS
> +	unsigned long reject_jiffies_start;	/* jiffie count at start of
> +						   NET_XMIT rejects */
> +#endif /* VIRTNIC_STATS */
> +};
> +
> +struct datachan {
> +	struct chaninfo chinfo;
> +	struct chanstat chstat;
> +};
> +
> +struct virtnic_info {
> +	struct virtpci_dev *virtpcidev;
> +	struct net_device *netdev;
> +	struct net_device_stats net_stats;
> +	spinlock_t priv_lock; /* spinlock check for private lock */
> +	struct datachan datachan;
> +	struct sk_buff **rcvbuf;	/* rcvbuf is the array of rcv buffer */
> +	/* we post to */
> +	unsigned long long uniquenum;
> +
> +	/* the IOPART end */
> +	int num_rcv_bufs;	/* indicates how many receive buffers the
> +				   vnic will post */
> +	int num_rcv_bufs_could_not_alloc;
> +	atomic_t num_rcv_bufs_in_iovm;	/* indicates how many receive buffers
> +					   have actully been sent to the iovm */
> +	unsigned long inner_loop_limit_reached_cnt;
> +	unsigned long alloc_failed_in_if_needed_cnt;
> +	unsigned long alloc_failed_in_repost_return_cnt;
> +
> +	struct sk_buff_head xmitbufhead;	/* xmitbufhead is the head of
> +						   the  xmit buffer list that
> +						   have been sent to the IOPART
> +						   end */
> +	int max_outstanding_net_xmits;	/* absolute max number of outstanding
> +					   xmits - should never hit this */
> +	int upper_threshold_net_xmits;	/* high water mark for calling
> +					   netif_stop_queue() */
> +	int lower_threshold_net_xmits;	/* high water mark for calling
> +					   netif_wake_queue() */
> +	uuid_le zoneguid;		/* specifies the zone for the switch in
> +					   which this VNIC resides  */
> +	struct uiscmdrsp *cmdrsp_rcv;	/* cmdrsp_rcv is used for
> +					   posting/unposting rcv buffers */
> +	unsigned short enabled;	/* 0 disabled 1 enabled to receive */
> +	unsigned short enab_dis_acked;	/* NET_RCV_ENABLE/DISABLE acked by
> +					   uisnic */
> +	atomic_t usage;			/* count of users */
> +	unsigned short old_flags;	/* flags as they were prior to
> +					   set_multicast_list */
> +	struct uiscmdrsp *xmit_cmdrsp;	/* used to issue NET_XMIT -  there is
> +					   never more that one xmit in progress
> +					   at a time */
> +	struct dentry *eth_debugfs_dir;	/* this points to /proc/eth?
> +						   directory */
> +	struct dentry *zone_debugfs_entry;	/* this points to
> +						   /proc/virtnic/eth?/zone */
> +	/* file */
> +	struct dentry *clientstr_debugfs_entry;/* this points to
> +						  /proc/virtnic/eth?/clientstr
> +						  file  */
> +	struct irq_info intr;	/* use recvInterrupt info  to connect
> +					   to this to receive interrupts when
> +					   IOs complete */
> +	int interrupt_vector;
> +	int thread_wait_ms;
> +	int queuefullmsg_logged;	/* flag for throttling queue full */
> +	/* messages */
> +	/* some debug counters */
> +	ulong n_rcv0;			/* # rcvs of 0 buffers */
> +	ulong n_rcv1;			/* # rcvs of 1 buffer */
> +	ulong n_rcv2;			/* # rcvs of 2 buffers */
> +	ulong n_rcvx;			/* # rcvs of >2 buffers */
> +	ulong found_repost_rcvbuf_cnt;	/* #time we called repost_rcvbuf_cnt */
> +	ulong repost_found_skb_cnt;	/* # times found the skb */
> +	ulong n_repost_deficit;		/* # times we couldn't find all of the
> +					   rcv buffers */
> +	ulong bad_rcv_buf;		/* # times we neglected to
> +					     free the rcv skb because
> +					     we didn't know where it
> +					     came from */
> +	ulong n_rcv_packet_not_accepted;	/* # bogus recv packets */
> +	bool server_down;
> +	bool server_change_state;
> +	unsigned long long interrupts_rcvd;
> +	unsigned long long interrupts_notme;
> +	unsigned long long interrupts_disabled;
> +	unsigned long long busy_cnt;
> +	unsigned long long flow_control_upper_hits;
> +	unsigned long long flow_control_lower_hits;
> +	struct work_struct serverdown_completion;
> +	struct work_struct timeout_reset;
> +	uint64_t __iomem *flags_addr;
> +	atomic_t interrupt_rcvd;
> +	wait_queue_head_t rsp_queue;
> +};
> +
> +struct virtnic_devices_open {
> +	struct net_device *netdev;
> +	struct virtnic_info *vnicinfo;
> +};
> +
> +static ssize_t show_zone(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct net_device *net = to_net_dev(dev);
> +	struct virtnic_info *vnicinfo = netdev_priv(net);
> +
> +	return scnprintf(buf, PAGE_SIZE, "%pUL\n", &vnicinfo->zoneguid);
> +}
> +
> +static ssize_t show_clientstr(struct device *dev, struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct net_device *net = to_net_dev(dev);
> +	struct virtnic_info *vnicinfo = netdev_priv(net);
> +	struct spar_io_channel_protocol *chan =
> +		(struct spar_io_channel_protocol *)vnicinfo->
> +		datachan.chinfo.queueinfo->chan;
> +
> +	return scnprintf(buf, PAGE_SIZE, "%s\n",
> +			(char *)&chan->client_string);
> +}
> +static DEVICE_ATTR(clientstr, S_IRUGO, show_clientstr, NULL);
> +static DEVICE_ATTR(zone, S_IRUGO, show_zone, NULL);
> +
> +#define VIRTNICSOPENMAX 32
> +/* array of open devices maintained by open() and close() */
> +static struct virtnic_devices_open num_virtnic_open[VIRTNICSOPENMAX];
> +static struct dentry *virtnic_debugfs_dir;
> +
> +static const struct file_operations debugfs_info_fops = {
> +	.read = info_debugfs_read,
> +};
> +
> +static const struct file_operations debugfs_enable_ints_fops = {
> +	.write = enable_ints_write,
> +};
> +
> +/*****************************************************/
> +/* Probe Remove Functions                            */
> +/*****************************************************/
> +/* set up net.rcvpost struct in cmdrsp.
> + * all rcv buf skb are allocated at RCVPOST_BUF_SIZE, so length is
> + * RCVPOST_BUF_SIZE by default. and since RCVPOST_BUF_SIZE < 2048, one
> + * phys_info struct can describe the rcv buf.
> + */
> +static inline void
> +post_skb(struct uiscmdrsp *cmdrsp,
> +	 struct virtnic_info *vnicinfo, struct sk_buff *skb)
> +{
> +	cmdrsp->net.buf = skb;
> +	cmdrsp->net.rcvpost.frag.pi_pfn = page_to_pfn(virt_to_page(skb->data));
> +	cmdrsp->net.rcvpost.frag.pi_off =
> +		(unsigned long)skb->data & PI_PAGE_MASK;
> +	cmdrsp->net.rcvpost.frag.pi_len = skb->len;
> +	cmdrsp->net.rcvpost.unique_num = vnicinfo->uniquenum;
> +
> +	DBGINF("RCV_POST skb:%p pfn:%llu off:%x len:%d\n", skb,
> +	       cmdrsp->net.rcvpost.frag.pi_pfn,
> +	       cmdrsp->net.rcvpost.frag.pi_off,
> +	       cmdrsp->net.rcvpost.frag.pi_len);
> +	if ((cmdrsp->net.rcvpost.frag.pi_off + skb->len) > PI_PAGE_SIZE) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "**** pi_off:0x%x pi_len:%d SPAN ACROSS A PAGE\n",
> +			   cmdrsp->net.rcvpost.frag.pi_off, skb->len);
> +	} else {
> +		cmdrsp->net.type = NET_RCV_POST;
> +		cmdrsp->cmdtype = CMD_NET_TYPE;
> +		uisqueue_put_cmdrsp_with_lock_client(vnicinfo->datachan.chinfo.
> +						     queueinfo, cmdrsp,
> +						     IOCHAN_TO_IOPART,
> +						     (void *)&vnicinfo->
> +						     datachan.chinfo.insertlock,
> +						     DONT_ISSUE_INTERRUPT,
> +						     (uint64_t)NULL,
> +						     OK_TO_WAIT,
> +						     "vnic");
> +		atomic_inc(&vnicinfo->num_rcv_bufs_in_iovm);
> +		vnicinfo->datachan.chstat.sent_post++;
> +	}
> +}
> +
> +static irqreturn_t
> +virtnic_ISR(int irq, void *dev_id)
> +{
> +	struct virtnic_info *vnicinfo = (struct virtnic_info *)dev_id;
> +
> +	struct channel_header __iomem *p_channel_header;
> +
> +	struct signal_queue_header __iomem *pqhdr;
> +	uint64_t mask;
> +	unsigned long long rc1;
> +
> +	if (vnicinfo == NULL)
> +		return IRQ_NONE;
> +	vnicinfo->interrupts_rcvd++;
> +	p_channel_header = vnicinfo->datachan.chinfo.queueinfo->chan;
> +	if (((readq(&p_channel_header->features) &
> +	      ULTRA_IO_IOVM_IS_OK_WITH_DRIVER_DISABLING_INTS) != 0) &&
> +	    ((readq(&p_channel_header->features) &
> +	      ULTRA_IO_DRIVER_DISABLES_INTS) != 0)) {
> +		/*
> +		 * should not enter this path because we setup without
> +		 * DRIVER_DISABLES_INTS.
> +		 */
> +		vnicinfo->interrupts_disabled++;
> +		mask = ~ULTRA_CHANNEL_ENABLE_INTS;
> +		rc1 = uisqueue_interlocked_and(vnicinfo->flags_addr, mask);
> +	}
> +	if (spar_signalqueue_empty(p_channel_header, IOCHAN_FROM_IOPART)) {
> +		vnicinfo->interrupts_notme++;
> +		return IRQ_NONE;
> +	}
> +	pqhdr = (struct signal_queue_header __iomem *)
> +		((char __iomem *)p_channel_header +
> +		 readq(&p_channel_header->ch_space_offset)) +
> +		IOCHAN_FROM_IOPART;
> +	writeq(readq(&pqhdr->num_irq_received) + 1,
> +	       &pqhdr->num_irq_received);
> +	atomic_set(&vnicinfo->interrupt_rcvd, 1);
> +	wake_up_interruptible(&vnicinfo->rsp_queue);
> +	return IRQ_HANDLED;
> +}
> +
> +static const struct net_device_ops virtnic_dev_ops = {
> +	.ndo_open = virtnic_open,
> +	.ndo_stop = virtnic_close,
> +	.ndo_start_xmit = virtnic_xmit,
> +	.ndo_get_stats = virtnic_get_stats,
> +	.ndo_do_ioctl = virtnic_ioctl,
> +	.ndo_change_mtu = virtnic_change_mtu,
> +	.ndo_tx_timeout = virtnic_xmit_timeout,
> +	.ndo_set_rx_mode = virtnic_set_multi,
> +};
> +
> +static int
> +virtnic_probe(struct virtpci_dev *virtpcidev, const struct pci_device_id *id)
> +{
> +	struct net_device *netdev = NULL;
> +	struct virtnic_info *vnicinfo;
> +	int err;
> +	int rsp;
> +	irq_handler_t handler = virtnic_ISR;
> +	struct channel_header __iomem *p_channel_header;
> +	struct signal_queue_header __iomem *pqhdr;
> +	uint64_t mask;
> +
> +#define RETFAIL(res) {\
> +		kfree(vnicinfo->cmdrsp_rcv);  \
> +		kfree(vnicinfo->xmit_cmdrsp); \
> +		kfree(vnicinfo->rcvbuf);      \
> +		if (vnicinfo->interrupt_vector != -1)		\
> +			free_irq(vnicinfo->interrupt_vector, vnicinfo); \
> +		if (netdev)						\
> +			free_netdev(netdev);				\
> +		return res;						\
> +}
> +
> +	DBGINF("virtpci_dev:%p\n", virtpcidev);
> +	DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>",
> +	       virtpcidev->busNo, virtpcidev->deviceNo);
> +	netdev = alloc_etherdev(sizeof(struct virtnic_info));
> +	if (netdev == NULL) {
> +		LOGERR("**** FAILED to alloc etherdev\n");
> +		return -ENOMEM;
> +	}
> +	netdev->netdev_ops = &virtnic_dev_ops;
> +	netdev->watchdog_timeo = VIRTNIC_XMIT_TIMEOUT;
> +
> +	memcpy(netdev->dev_addr, virtpcidev->net.mac_addr, MAX_MACADDR_LEN);
> +	netdev->addr_len = MAX_MACADDR_LEN;
> +	/* netdev->name should be ethx already */
> +	netdev->dev.parent = &virtpcidev->generic_dev;
> +
> +	/* setup our private struct */
> +	vnicinfo = netdev_priv(netdev);
> +	memset(vnicinfo, 0, sizeof(struct virtnic_info));
> +	vnicinfo->interrupt_vector = -1;
> +	vnicinfo->netdev = netdev;
> +	vnicinfo->virtpcidev = virtpcidev;
> +	init_waitqueue_head(&vnicinfo->rsp_queue);
> +	spin_lock_init(&vnicinfo->priv_lock);
> +	vnicinfo->datachan.chinfo.queueinfo = &virtpcidev->queueinfo;
> +	spin_lock_init(&vnicinfo->datachan.chinfo.insertlock);
> +	vnicinfo->enabled = 0;	/* not yet */
> +	atomic_set(&vnicinfo->usage, 1);	/* starting val */
> +	vnicinfo->zoneguid = virtpcidev->net.zone_uuid;
> +	vnicinfo->num_rcv_bufs = virtpcidev->net.num_rcv_bufs;
> +	LOGINFNAME(vnicinfo->netdev, "num_rcv_bufs =  %d\n",
> +		   vnicinfo->num_rcv_bufs);
> +	vnicinfo->rcvbuf = kmalloc(sizeof(struct sk_buff *) *
> +				   vnicinfo->num_rcv_bufs, GFP_ATOMIC);
> +	if (vnicinfo->rcvbuf == NULL) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "**** FAILED to allocate memory for %d receive buffers.\n",
> +			   vnicinfo->num_rcv_bufs);
> +		RETFAIL(-ENOMEM);
> +	}
> +	memset(vnicinfo->rcvbuf, 0,
> +	       sizeof(struct sk_buff *) * vnicinfo->num_rcv_bufs);
> +	/* set the net_xmit outstanding threshold */
> +	vnicinfo->max_outstanding_net_xmits =
> +	    max(3, ((vnicinfo->num_rcv_bufs / 3) - 2));
> +	/* always leave two slots open but you should have 3 at a minimum */
> +	LOGINFNAME(vnicinfo->netdev, "max_outstanding_net_xmits =  %d\n",
> +		   vnicinfo->max_outstanding_net_xmits);
> +	vnicinfo->upper_threshold_net_xmits =
> +	    max(2, vnicinfo->max_outstanding_net_xmits - 1);
> +	LOGINFNAME(vnicinfo->netdev, "upper_threshold_net_xmits =  %d\n",
> +		   vnicinfo->upper_threshold_net_xmits);
> +	vnicinfo->lower_threshold_net_xmits =
> +	    max(1, vnicinfo->max_outstanding_net_xmits / 2);
> +	LOGINFNAME(vnicinfo->netdev, "lower_threshold_net_xmits =  %d\n",
> +		   vnicinfo->lower_threshold_net_xmits);
> +	skb_queue_head_init(&vnicinfo->xmitbufhead);
> +
> +	/* create a cmdrsp we can use to post and unpost rcv buffers  */
> +	vnicinfo->cmdrsp_rcv = kmalloc(SIZEOF_CMDRSP, GFP_ATOMIC);
> +	if (vnicinfo->cmdrsp_rcv == NULL) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "**** FAILED to allocate cmdrsp to use for posting rcv buffers\n");
> +		RETFAIL(-ENOMEM);
> +	}
> +	vnicinfo->xmit_cmdrsp = kmalloc(SIZEOF_CMDRSP, GFP_ATOMIC);
> +	if (vnicinfo->xmit_cmdrsp == NULL) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "**** FAILED to allocate cmdrsp to use for xmits\n");
> +		RETFAIL(-ENOMEM);
> +	}
> +	INIT_WORK(&vnicinfo->serverdown_completion,
> +		  virtnic_serverdown_complete);
> +	INIT_WORK(&vnicinfo->timeout_reset, virtnic_timeout_reset);
> +	vnicinfo->server_down = false;
> +	vnicinfo->server_change_state = false;
> +
> +	/* set the default mtu */
> +	netdev->mtu = virtpcidev->net.mtu;
> +
> +	vnicinfo->intr = virtpcidev->intr;
> +	/* buffers will be allocated in open using mtu */
> +
> +	/* save off netdev in virtpcidev  */
> +	virtpcidev->net.netdev = netdev;
> +
> +	/* start thread that will receive responses */
> +	writeq(readq(&vnicinfo->datachan.chinfo.queueinfo->chan->features) |
> +	       ULTRA_IO_CHANNEL_IS_POLLING,
> +	       &vnicinfo->datachan.chinfo.queueinfo->chan->features);
> +	DBGINF("starting rsp thread queueinfo:%p threadinfo:%p\n",
> +	       vnicinfo->datachan.chinfo.queueinfo,
> +	       &vnicinfo->datachan.chinfo.threadinfo);
> +	p_channel_header = vnicinfo->datachan.chinfo.queueinfo->chan;
> +	pqhdr = (struct signal_queue_header __iomem *)
> +		((char __iomem *)p_channel_header +
> +		 readq(&p_channel_header->ch_space_offset)) +
> +	    IOCHAN_FROM_IOPART;
> +	vnicinfo->flags_addr = (__force uint64_t __iomem *)&pqhdr->features;
> +	vnicinfo->thread_wait_ms = 2;
> +	if (!uisthread_start(&vnicinfo->datachan.chinfo.threadinfo,
> +			     process_incoming_rsps, &vnicinfo->datachan,
> +			     "vnic_incoming")) {
> +		LOGERRNAME(vnicinfo->netdev, "**** FAILED to start thread\n");
> +		RETFAIL(-ENODEV);
> +	}
> +
> +	/* register_netdev */
> +	LOGINFNAME(vnicinfo->netdev, "sendInterruptHandle=0x%16llX",
> +		   (unsigned long long)vnicinfo->intr.send_irq_handle);
> +	LOGINFNAME(vnicinfo->netdev, "recvInterruptHandle=0x%16llX",
> +		   (unsigned long long)vnicinfo->intr.recv_irq_handle);
> +	LOGINFNAME(vnicinfo->netdev, "recvInterruptVector=0x%8X",
> +		   vnicinfo->intr.recv_irq_vector);
> +	LOGINFNAME(vnicinfo->netdev, "recvInterruptShared=0x%2X",
> +		   vnicinfo->intr.recv_irq_shared);
> +	LOGINFNAME(vnicinfo->netdev, "netdev->name=%s", netdev->name);
> +	vnicinfo->interrupt_vector = vnicinfo->intr.recv_irq_handle &
> +	    INTERRUPT_VECTOR_MASK;
> +	netdev->irq = vnicinfo->interrupt_vector;
> +	err = register_netdev(netdev);
> +	if (err) {
> +		uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
> +		RETFAIL(err);
> +	}
> +
> +	/* create proc/ethx directory */
> +	vnicinfo->eth_debugfs_dir = debugfs_create_dir(netdev->name,
> +						       virtnic_debugfs_dir);
> +	if (!vnicinfo->eth_debugfs_dir) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "****FAILED to create proc dir entry:%s\n",
> +			   netdev->name);
> +		uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
> +		RETFAIL(-ENODEV);
> +	}
> +
> +	if (device_create_file(&netdev->dev, &dev_attr_zone) < 0) {
> +		uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
> +		RETFAIL(-ENODEV);
> +	}
> +	if (device_create_file(&netdev->dev, &dev_attr_clientstr) < 0) {
> +		device_remove_file(&netdev->dev, &dev_attr_zone);
> +		uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
> +		RETFAIL(-ENODEV);
> +	}
> +	/* create proc/ethx directory  */
> +	rsp = request_irq(vnicinfo->interrupt_vector, handler, IRQF_SHARED,
> +			  netdev->name, vnicinfo);
> +	if (rsp != 0) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "request_irq(%d) uislib_vnic_ISR request failed with rsp=%d\n",
> +			   vnicinfo->interrupt_vector, rsp);
> +		vnicinfo->interrupt_vector = -1;
> +	} else {
> +		uint64_t __iomem *features_addr =
> +		    &vnicinfo->datachan.chinfo.queueinfo->chan->features;
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "request_irq(%d) uislib_vnic_ISR request succeeded\n",
> +			   vnicinfo->interrupt_vector);
> +		mask = ~(ULTRA_IO_CHANNEL_IS_POLLING |
> +			 ULTRA_IO_DRIVER_DISABLES_INTS |
> +			 ULTRA_IO_DRIVER_SUPPORTS_ENHANCED_RCVBUF_CHECKING);
> +		uisqueue_interlocked_and(features_addr, mask);
> +		mask = ULTRA_IO_DRIVER_ENABLES_INTS |
> +		    ULTRA_IO_DRIVER_SUPPORTS_ENHANCED_RCVBUF_CHECKING;
> +		uisqueue_interlocked_or(features_addr, mask);
> +
> +		vnicinfo->thread_wait_ms = 2000;
> +	}
> +
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "Added VirtNic:%p %s insertlock:%p %02x:%02x:%02x:%02x:%02x:%02x\n",
> +		   netdev, netdev->name, &vnicinfo->datachan.chinfo.insertlock,
> +		   netdev->dev_addr[0], netdev->dev_addr[1],
> +		   netdev->dev_addr[2], netdev->dev_addr[3],
> +		   netdev->dev_addr[4], netdev->dev_addr[5]);
> +	return 0;
> +}
> +
> +static void
> +virtnic_remove(struct virtpci_dev *virtpcidev)
> +{
> +	struct net_device *netdev = virtpcidev->net.netdev;
> +	struct virtnic_info *vnicinfo;
> +
> +	vnicinfo = netdev_priv(netdev);
> +
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "virtpcidev:%p netdev:%p name:%s vnicinfo:%p\n",
> +		   virtpcidev, netdev, netdev->name, vnicinfo);
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "virtpcidev busNo<<%d>>devNo<<%d>>",
> +		   virtpcidev->bus_no, virtpcidev->device_no);
> +	/* REMOVE netdev */
> +	DBGINF("unregistering netdev\n");
> +	if (vnicinfo->interrupt_vector != -1)
> +		free_irq(vnicinfo->interrupt_vector, vnicinfo);
> +	unregister_netdev(netdev);
> +	/* this is going to call virtnic_close which will send out */
> +	/* disable don't take thread down until after that */
> +	uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
> +
> +	/* freeing of rcv bufs should have happened in close. */
> +	/* free cmdrsp we allocated for rcv post/unpost */
> +	kfree(vnicinfo->cmdrsp_rcv);
> +	kfree(vnicinfo->xmit_cmdrsp);
> +
> +	/* delete proc file entries */
> +	device_remove_file(&netdev->dev, &dev_attr_zone);
> +	device_remove_file(&netdev->dev, &dev_attr_clientstr);
> +
> +	debugfs_remove(vnicinfo->eth_debugfs_dir);
> +	LOGINFNAME(vnicinfo->netdev, "removed dentry %s\n",
> +		   netdev->name);
> +
> +	kfree(vnicinfo->rcvbuf);
> +	free_netdev(netdev);
> +
> +	LOGINF("virtnic removed\n");
> +}
> +
> +/*****************************************************/
> +/* NIC statistics handling					         */
> +/*****************************************************/
> +
> +/* update rcv stats - locking done by invoker */
> +#define UPD_RCV_STATS { \
> +	vnicinfo->net_stats.rx_packets++;  \
> +	vnicinfo->net_stats.rx_bytes += skb->len;  \
> +}
> +
> +/* update xmt stats - locking done by invoker */
> +#define UPD_XMT_STATS { \
> +	vnicinfo->net_stats.tx_packets++;  \
> +	vnicinfo->net_stats.tx_bytes += skb->len;  \
> +}
> +
> +static struct net_device_stats *
> +virtnic_get_stats(struct net_device *netdev)
> +{
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +
> +	/* take this opportunity to print out our internal stats */
> +	DBGINF
> +	    ("NET_RCV_ENBDIS sent: %ld     NET_RCV_ENBDIS_ACK received: %ld\n",
> +	     vnicinfo->datachan.chstat.sent_enbdis,
> +	     vnicinfo->datachan.chstat.got_enbdisack);
> +
> +	DBGINF("NET_RCV received: %ld        NET_RCV_POST sent: %ld\n",
> +	       vnicinfo->datachan.chstat.got_rcv,
> +	       vnicinfo->datachan.chstat.sent_post);
> +
> +	DBGINF("extra NET_RCV_POST sent: %ld\n",
> +	       vnicinfo->datachan.chstat.extra_rcvbufs_sent);
> +
> +	DBGINF("NET_XMIT sent: %ld           NET_XMIT_DONE received: %ld\n",
> +	       vnicinfo->datachan.chstat.sent_xmit,
> +	       vnicinfo->datachan.chstat.got_xmit_done);
> +
> +	DBGINF("XMIT failures: %ld           NET_RCV_PROMISC sent: %ld\n",
> +	       vnicinfo->datachan.chstat.xmit_fail,
> +	       vnicinfo->datachan.chstat.sent_promisc);
> +
> +	DBGINF("XMIT reject/busy: %ld\n",
> +	       vnicinfo->datachan.chstat.reject_count);
> +
> +	return &vnicinfo->net_stats;
> +}
> +
> +/*****************************************************/
> +/* Local functions                                   */
> +/*****************************************************/
> +
> +/*
> + * This function allocates skb, skb->data for first fragment. If Mtu
> + * size is > default, it allocates frags.
> + */
> +static struct sk_buff *
> +alloc_rcv_buf(struct net_device *netdev)
> +{
> +	struct sk_buff *skb;
> +
> +/*
> + * NOTE: the first fragment in each rcv buffer is pointed to by rcvskb->data.
> + * For now all rcv buffers will be RCVPOST_BUF_SIZE in length, so the firstfrag
> + * is large enough to hold 1514.
> + */
> +	DBGINF("netdev->name <<%s>>:  allocating skb len:%d\n", netdev->name,
> +	       RCVPOST_BUF_SIZE);
> +	skb = alloc_skb(RCVPOST_BUF_SIZE, GFP_ATOMIC | __GFP_NOWARN);
> +	if (!skb) {
> +		LOGVER("**** alloc_skb failed\n");
> +		return NULL;
> +	}
> +	skb->dev = netdev;
> +	skb->len = RCVPOST_BUF_SIZE;
> +	/* current value of mtu doesn't come into play here; large
> +	 * packets will just end up using multiple rcv buffers all of
> +	 * same size
> +	 */
> +	skb->data_len = 0;	/* dev_alloc_skb already zeroes it out.
> +				   for clarification. */
> +	return skb;
> +}
> +
> +static int
> +init_rcv_bufs(struct net_device *netdev, struct virtnic_info *vnicinfo)
> +{
> +	int i, count;
> +
> +	DBGINF("netdev->name <<%s>>", netdev->name);
> +	/*
> +	 * allocate fixed number of receive buffers to post to uisnic
> +	 * post receive buffers after we've allocated a required
> +	 * amount
> +	 */
> +	for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
> +		vnicinfo->rcvbuf[i] = alloc_rcv_buf(netdev);
> +		if (!vnicinfo->rcvbuf[i])
> +			break;	/* if we failed to allocate one let us stop */
> +	}
> +	if (i < vnicinfo->num_rcv_bufs) {
> +		LOGWRNNAME(vnicinfo->netdev,
> +			   "only allocated %d of %d receive buffers", i,
> +			   vnicinfo->num_rcv_bufs);
> +		if (i == 0) {
> +			/* couldn't even allocate one - bail out */
> +			LOGERRNAME(vnicinfo->netdev,
> +				   "**** FAILED to allocate any rcv buffers\n");
> +			return -ENOMEM;
> +		}
> +	}
> +	count = i;
> +	/* Ensure we can alloc 2/3rd of the requested number of
> +	 * buffers. 2/3 is an arbitraty choice; used also in ndis
> +	 * init.c.
> +	 */
> +	if (count < ((2 * vnicinfo->num_rcv_bufs) / 3)) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "**** FAILED to allocate enough rcv bufs; allocated only:%d MAX_NET_RCV_BUFS:%d\n",
> +			   count, MAX_NET_RCV_BUFS);
> +		/* free receive buffers we did allocate and then bail out */
> +		for (i = 0; i < count; i++) {
> +			kfree_skb(vnicinfo->rcvbuf[i]);
> +			vnicinfo->rcvbuf[i] = NULL;
> +		}
> +		return -ENOMEM;
> +	}
> +
> +	/* post receive buffers to receive incoming input - without holding */
> +	/* lock - we've not enabled nor started the queue so there shouldn't */
> +	/* be any rcv or xmit activity */
> +	for (i = 0; i < count; i++)
> +		post_skb(vnicinfo->cmdrsp_rcv, vnicinfo, vnicinfo->rcvbuf[i]);
> +
> +	/* push through with what buffers we've got - unallocated ones will */
> +	/* be null */
> +	LOGINFNAME(vnicinfo->netdev, "Allocated & posted %d rcv buffers\n",
> +		   count);
> +
> +	return 0;
> +}
> +
> +/* Sends disable to IOVM and frees receive buffers that were posted to
> + * IOVM (cleared by IOVM when disable is received)
> + * returns 0 on success, negative number on failure
> + *
> + * timeout is defined in msecs (timeout of 0 specifies infinite wait)
> + */
> +static int
> +virtnic_disable_with_timeout(struct net_device *netdev, const int timeout)
> +{
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +	int i, count = 0;
> +	unsigned long flags;
> +	int wait = 0;
> +
> +	LOGINFNAME(vnicinfo->netdev, "netdev->name <<%s>>", netdev->name);
> +	/* stop the transmit queue so nothing more can be transmitted */
> +	netif_stop_queue(netdev);
> +
> +	/* send a msg telling the other end we are stopping incoming pkts */
> +	spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +	vnicinfo->enabled = 0;
> +	vnicinfo->enab_dis_acked = 0;	/* must wait for ack */
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +	/* send disable and wait for ack - don't hold lock when
> +	 * sending disable because if the queue is full, insert might
> +	 * sleep.
> +	 */
> +	SEND_ENBDIS(netdev, 0, vnicinfo->cmdrsp_rcv,
> +		    vnicinfo->datachan.chinfo.queueinfo,
> +		    &vnicinfo->datachan.chinfo.insertlock,
> +		    vnicinfo->datachan.chstat);
> +
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "Waiting for ENBDIS ACK before freeing rcv buffers...\n");
> +	/* wait for ack to arrive before we try to free rcv buffers
> +	 * NOTE: the other end automatically unposts the rcv buffers
> +	 * when it gets a disable.
> +	 */
> +	while ((timeout == VIRTNIC_INFINITE_RESPONSE_WAIT) ||
> +	       (wait < timeout)) {
> +		spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +		if (vnicinfo->n_rcv_packet_not_accepted) {
> +			/* now we can continue with disable */
> +			break;
> +		} else if (vnicinfo->server_down ||
> +			vnicinfo->server_change_state) {
> +			LOGERRNAME(vnicinfo->netdev,
> +				   "IOVM is down so disable will not be acknowledged.  Stopping wait.\n");
> +			spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +			return -1;
> +		}
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +		wait += schedule_timeout(msecs_to_jiffies(10));
> +	}
> +	if (!vnicinfo->n_rcv_packet_not_accepted) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "IOVM did not respond to Disable in allocated time (%d msecs).\n",
> +			   timeout);
> +		spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +		return -1;
> +	}
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "Got ENBDIS ACK; now waiting for 0 usage count...\n");
> +
> +	/*
> +	 * wait for usage to go to 1 (no other users) before freeing
> +	 * rcv buffers
> +	 */
> +	if (atomic_read(&vnicinfo->usage) > 1) {
> +		/* wait for usage count to be 1 */
> +		while (1) {
> +			set_current_state(TASK_INTERRUPTIBLE);
> +			spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +			schedule_timeout(msecs_to_jiffies(10));
> +			spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +			if (atomic_read(&vnicinfo->usage) == 1) {
> +				break;	/* go do work and only after
> +					   that give up lock */
> +			}
> +		}
> +	}
> +	/* we've set enabled to 0, so we can give up the lock. */
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "Usage count is 0; freeing the rcv buffers now\n");
> +
> +	/* free rcv buffers - other end has automatically unposted
> +	 * them on disable
> +	 */
> +	for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
> +		if (vnicinfo->rcvbuf[i]) {
> +			kfree_skb(vnicinfo->rcvbuf[i]);
> +			vnicinfo->rcvbuf[i] = NULL;
> +			count++;
> +		}
> +	}
> +	LOGINFNAME(vnicinfo->netdev, "Freed %d rcv bufs\n", count);
> +
> +	/* remove references from debug array */
> +	for (i = 0; i < VIRTNICSOPENMAX; i++) {
> +		if (num_virtnic_open[i].netdev == netdev) {
> +			num_virtnic_open[i].netdev = NULL;
> +			num_virtnic_open[i].vnicinfo = NULL;
> +			break;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/* Wait indefinitely for IOVM to acknowledge disable request */
> +static int
> +virtnic_disable(struct net_device *netdev)
> +{
> +	return virtnic_disable_with_timeout(netdev,
> +					    VIRTNIC_INFINITE_RESPONSE_WAIT);
> +}
> +
> +/* Sends enable to IOVM, inits, and  posts receive buffers to IOVM
> + * returns 0 on success, negative number on failure
> + *
> + * timeout is defined in msecs (timeout of 0 specifies infinite wait)
> + */
> +static int
> +virtnic_enable_with_timeout(struct net_device *netdev, const int timeout)
> +{
> +	int i;
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +	unsigned long flags;
> +	int wait = 0;
> +
> +	/* NOTE: the other end automatically unposts the rcv buffers when
> +	 * it gets a disable.
> +	 */
> +	i = init_rcv_bufs(netdev, vnicinfo);
> +	if (i < 0)
> +		return i;
> +
> +	spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +	vnicinfo->enabled = 1;
> +	/* now we're ready, let's send an ENB to uisnic but until we
> +	 * get an ACK back from uisnic, we'll drop the packets
> +	 */
> +	vnicinfo->n_rcv_packet_not_accepted = 0;
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +	/* send enable and wait for ack - don't hold lock when sending
> +	 * enable because if the queue is full, insert might sleep.
> +	 */
> +	SEND_ENBDIS(netdev, 1, vnicinfo->cmdrsp_rcv,
> +		    vnicinfo->datachan.chinfo.queueinfo,
> +		    &vnicinfo->datachan.chinfo.insertlock,
> +		    vnicinfo->datachan.chstat);
> +
> +	LOGINFNAME(vnicinfo->netdev, "netdev->name <<%s>>", netdev->name);
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "Waiting for ENBDIS ACK before starting device queue...\n");
> +	while ((timeout == VIRTNIC_INFINITE_RESPONSE_WAIT) ||
> +	       (wait < timeout)) {
> +		spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +		if (vnicinfo->enab_dis_acked) {
> +			/* now we can continue  */
> +			break;
> +		} else if (vnicinfo->server_down ||
> +			   vnicinfo->server_change_state) {
> +			/* IOVM is going down so don't wait for a response */
> +			LOGERRNAME(vnicinfo->netdev,
> +				   "IOVM is down so enable will not be acknowledged.  Stopping wait.\n");
> +			spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +			return -1;
> +		}
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +		wait += schedule_timeout(msecs_to_jiffies(10));
> +	}
> +	if (!vnicinfo->enab_dis_acked) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "IOVM did not respond to Enable in allocated time (%d msecs).\n",
> +			   timeout);
> +		spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +		return -1;
> +	}
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +	LOGINFNAME(vnicinfo->netdev, "Got ENBDIS ACK\n");
> +
> +	/* find an open slot in the array to save off VirtNic
> +	 * references for debug
> +	 */
> +	for (i = 0; i < VIRTNICSOPENMAX; i++) {
> +		if (num_virtnic_open[i].netdev == NULL) {
> +			num_virtnic_open[i].netdev = netdev;
> +			num_virtnic_open[i].vnicinfo = vnicinfo;
> +			break;
> +		}
> +	}
> +	if (i == VIRTNICSOPENMAX)
> +		LOGINFNAME(vnicinfo->netdev,
> +			   "No storage for debug ref for netdev = 0x%p vnicinfo = 0x%p\n",
> +			   netdev, vnicinfo);
> +
> +	return 0;
> +}
> +
> +/* Wait indefinitely for IOVM to acknowledge enable request */
> +static int
> +virtnic_enable(struct net_device *netdev)
> +{
> +	return virtnic_enable_with_timeout(netdev,
> +		VIRTNIC_INFINITE_RESPONSE_WAIT);
> +}
> +
> +static void
> +send_rcv_posts_if_needed(struct virtnic_info *vnicinfo)
> +{
> +	int i;
> +	struct net_device *netdev;
> +	struct uiscmdrsp *cmdrsp = vnicinfo->cmdrsp_rcv;
> +	int cur_num_rcv_bufs_to_alloc, rcv_bufs_allocated;
> +
> +	if (!(vnicinfo->enabled && vnicinfo->enab_dis_acked)) {
> +		/* dont do this until vnic is marked ready. */
> +		return;
> +	}
> +	netdev = vnicinfo->netdev;
> +	rcv_bufs_allocated = 0;
> +	/* this code is trying to prevent getting stuck here forever,
> +	 * but still retry it if you cant allocate them all this
> +	 * time.
> +	 */
> +	cur_num_rcv_bufs_to_alloc = vnicinfo->num_rcv_bufs_could_not_alloc;
> +	while (cur_num_rcv_bufs_to_alloc > 0) {
> +		cur_num_rcv_bufs_to_alloc--;
> +		for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
> +			if (vnicinfo->rcvbuf[i] != NULL)
> +				continue;
> +			vnicinfo->rcvbuf[i] = alloc_rcv_buf(netdev);
> +			if (!vnicinfo->rcvbuf[i]) {
> +				LOGVER("**** %s FAILED to allocate new rcv buf - no REPOST\n",
> +				       netdev->name);
> +				vnicinfo->
> +				    alloc_failed_in_if_needed_cnt++;
> +				break;
> +			} else {
> +				rcv_bufs_allocated++;
> +				post_skb(cmdrsp, vnicinfo,
> +					 vnicinfo->rcvbuf[i]);
> +				vnicinfo->datachan.chstat.
> +				    extra_rcvbufs_sent++;
> +			}
> +		}
> +	}
> +	vnicinfo->num_rcv_bufs_could_not_alloc -= rcv_bufs_allocated;
> +	if (vnicinfo->num_rcv_bufs_could_not_alloc > 0) {
> +		/*
> +		 * this path means you failed to alloc an skb in the
> +		 * normal path, and you are trying again later, and
> +		 * it still fails.
> +		 */
> +		LOGVER("attempted to recover buffers which could not be allocated and failed");
> +		LOGVER("rcv_bufs_allocated=%d, num_rcv_bufs_could_not_alloc=%d",
> +		       rcv_bufs_allocated,
> +		       vnicinfo->num_rcv_bufs_could_not_alloc);
> +	}
> +}
> +
> +static void
> +drain_queue(struct datachan *dc, struct uiscmdrsp *cmdrsp,
> +	    struct virtnic_info *vnicinfo)
> +{
> +	unsigned long flags;
> +	int qrslt;
> +	struct net_device *netdev;
> +
> +	/* drain queue */
> +	while (1) {
> +		spin_lock_irqsave(&dc->chinfo.insertlock, flags);
> +		if (!spar_channel_client_acquire_os(dc->chinfo.queueinfo->chan,
> +						    "vnic")) {
> +			spin_unlock_irqrestore(&dc->chinfo.insertlock,
> +					       flags);
> +			break;
> +		}
> +		qrslt = uisqueue_get_cmdrsp(dc->chinfo.queueinfo, cmdrsp,
> +					    IOCHAN_FROM_IOPART);
> +		spar_channel_client_release_os(dc->chinfo.queueinfo->chan,
> +					       "vnic");
> +		spin_unlock_irqrestore(&dc->chinfo.insertlock, flags);
> +		if (qrslt == 0)
> +			break;	/* queue empty */
> +		DBGINF("%p cmdrsp->net.type:%d\n",
> +		       &dc->chinfo.queueinfo, cmdrsp->net.type);
> +		switch (cmdrsp->net.type) {
> +		case NET_RCV:
> +			DBGINF("Got NET_RCV\n");
> +			dc->chstat.got_rcv++;
> +			/* process incoming packet */
> +			virtnic_rx(cmdrsp);
> +			break;
> +		case NET_XMIT_DONE:
> +			DBGINF("Got NET_XMIT_DONE %p\n", cmdrsp->net.buf);
> +			spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +			dc->chstat.got_xmit_done++;
> +			if (cmdrsp->net.xmtdone.xmt_done_result) {
> +				LOGERRNAME(vnicinfo->netdev,
> +					   "XMIT_DONE failure buf:%p\n",
> +					   cmdrsp->net.buf);
> +				dc->chstat.xmit_fail++;
> +			}
> +			/* only call queue wake if we stopped it */
> +			netdev = ((struct sk_buff *)cmdrsp->net.buf)->dev;
> +			/* ASSERT netdev == vnicinfo->netdev; */
> +			if (netdev != vnicinfo->netdev) {
> +				LOGERRNAME(vnicinfo->netdev, "NET_XMIT_DONE something wrong; vnicinfo->netdev:%p != cmdrsp->net.buf)->dev:%p\n",
> +					   vnicinfo->netdev, netdev);
> +			} else if (netif_queue_stopped(netdev)) {
> +				/*
> +				 * check to see if we have crossed
> +				 * the lower watermark for
> +				 * netif_wake_queue()
> +				 */
> +				if (((vnicinfo->datachan.chstat.sent_xmit >=
> +				    vnicinfo->datachan.chstat.got_xmit_done) &&
> +				    (vnicinfo->datachan.chstat.sent_xmit -
> +				    vnicinfo->datachan.chstat.got_xmit_done <=
> +				    vnicinfo->lower_threshold_net_xmits)) ||
> +				    ((vnicinfo->datachan.chstat.sent_xmit <
> +				    vnicinfo->datachan.chstat.got_xmit_done) &&
> +				    (ULONG_MAX -
> +				    vnicinfo->datachan.chstat.got_xmit_done
> +				    + vnicinfo->datachan.chstat.sent_xmit <=
> +				    vnicinfo->lower_threshold_net_xmits))) {
> +					/*
> +					 * enough NET_XMITs completed
> +					 * so can restart netif queue
> +					 */
> +					netif_wake_queue(netdev);
> +					vnicinfo->flow_control_lower_hits++;
> +				}
> +			}
> +			skb_unlink(cmdrsp->net.buf, &vnicinfo->xmitbufhead);
> +			spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +			kfree_skb(cmdrsp->net.buf);
> +			break;
> +		case NET_RCV_ENBDIS_ACK:
> +			DBGINF("Got NET_RCV_ENBDIS_ACK on:%p\n",
> +			       (struct net_device *)
> +			       cmdrsp->net.enbdis.context);
> +			dc->chstat.got_enbdisack++;
> +			netdev = (struct net_device *)
> +				cmdrsp->net.enbdis.context;
> +			spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +			vnicinfo->enab_dis_acked = 1;
> +			spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +			if (vnicinfo->server_down &&
> +			    vnicinfo->server_change_state) {
> +				/* Inform Linux that the link is up */
> +				vnicinfo->server_down = false;
> +				vnicinfo->server_change_state = false;
> +				netif_wake_queue(netdev);
> +				netif_carrier_on(netdev);
> +			}
> +			break;
> +		case NET_CONNECT_STATUS:
> +			DBGINF("NET_CONNECT_STATUS, enable=:%d\n",
> +			       cmdrsp->net.enbdis.enable);
> +			netdev = vnicinfo->netdev;
> +			if (cmdrsp->net.enbdis.enable == 1) {
> +				spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +				vnicinfo->enabled = cmdrsp->net.enbdis.enable;
> +				spin_unlock_irqrestore(&vnicinfo->priv_lock,
> +						       flags);
> +				netif_wake_queue(netdev);
> +				netif_carrier_on(netdev);
> +			} else {
> +				netif_stop_queue(netdev);
> +				netif_carrier_off(netdev);
> +				spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +				vnicinfo->enabled = cmdrsp->net.enbdis.enable;
> +				spin_unlock_irqrestore(&vnicinfo->priv_lock,
> +						       flags);
> +			}
> +			break;
> +		default:
> +			LOGERRNAME(vnicinfo->netdev,
> +				   "Invalid net type:%d in cmdrsp\n",
> +				   cmdrsp->net.type);
> +			break;
> +		}
> +		/* cmdrsp is now available for reuse  */
> +
> +		if (dc->chinfo.threadinfo.should_stop)
> +			break;
> +	}
> +}
> +
> +static int
> +process_incoming_rsps(void *v)
> +{
> +	struct datachan *dc = v;
> +	struct uiscmdrsp *cmdrsp = NULL;
> +	const int SZ = SIZEOF_CMDRSP;
> +	struct virtnic_info *vnicinfo;
> +	struct channel_header __iomem *p_channel_header;
> +	struct signal_queue_header __iomem *pqhdr;
> +	uint64_t mask;
> +	unsigned long long rc1;
> +
> +	UIS_DAEMONIZE("vnic_incoming");
> +	DBGINF("In process_incoming_rsps pid:%d queueinfo:%p threadinfo:%p\n",
> +	       current->pid, dc->chinfo.queueinfo, &dc->chinfo.threadinfo);
> +	/* alloc once and reuse */
> +	vnicinfo = container_of(dc, struct virtnic_info, datachan);
> +	cmdrsp = kmalloc(SZ, GFP_ATOMIC);
> +	if (cmdrsp == NULL) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "**** FAILED to malloc - thread exiting\n");
> +		complete_and_exit(&dc->chinfo.threadinfo.has_stopped, 0);
> +	}
> +	p_channel_header = vnicinfo->datachan.chinfo.queueinfo->chan;
> +	pqhdr =
> +	       (struct signal_queue_header __iomem *)
> +	       ((char __iomem *)p_channel_header +
> +	       readq(&p_channel_header->ch_space_offset)) +
> +	       IOCHAN_FROM_IOPART;
> +	mask = ULTRA_CHANNEL_ENABLE_INTS;
> +	while (1) {
> +		wait_event_interruptible_timeout(
> +			vnicinfo->rsp_queue, (atomic_read
> +					      (&vnicinfo->interrupt_rcvd) == 1),
> +			msecs_to_jiffies(vnicinfo->thread_wait_ms));
> +		/*
> +		 * periodically check to see if there any rcv bufs which
> +		 * need to get sent to the iovm.   This can only happen if
> +		 * we run out of memory when trying to allocate skbs.
> +		 */
> +		atomic_set(&vnicinfo->interrupt_rcvd, 0);
> +		send_rcv_posts_if_needed(vnicinfo);
> +		drain_queue(dc, cmdrsp, vnicinfo);
> +		rc1 = uisqueue_interlocked_or((uint64_t __iomem *)
> +					     vnicinfo->flags_addr, mask);
> +		if (dc->chinfo.threadinfo.should_stop)
> +			break;
> +	}
> +
> +	kfree(cmdrsp);
> +	DBGINF("In process_incoming_nic_rsp exiting\n");
> +	complete_and_exit(&dc->chinfo.threadinfo.has_stopped, 0);
> +}
> +
> +/*****************************************************/
> +/* NIC support functions called external             */
> +/*****************************************************/
> +
> +static int
> +virtnic_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> +	LOGERRNAME(netdev, "netdev->name <<%s>>", netdev->name);
> +	LOGERRNAME(netdev, "**** FAILED: MTU cannot be changed at this end.\n");
> +	LOGERRNAME(netdev, "The same MTU is used for all the PNICs and VNICs in a switch.\n");
> +	LOGERRNAME(netdev, "Please change MTU from the Resource Partition\n");
> +	LOGERRNAME(netdev, "Current MTU is: %d\n", netdev->mtu);
> +	return -EINVAL;
> +	/*
> +	 * we cannot willy-nilly change the MTU; it has to come from
> +	 * CONTROL VM and all the vnics and pnics in a switch have to
> +	 * have the same MTU for everything to work.
> +	 */
> +}
> +
> +/*
> + * Called by kernel when ifconfig down is run.
> + * Returns 0 on success, negative value on failure.
> + */
> +static int
> +virtnic_close(struct net_device *netdev)
> +{
> +	/* this is called on ifconfig down but also if the device is
> +	 * being removed
> +	 */
> +	LOGINFNAME(netdev, "Closing %p name:%s\n", netdev, netdev->name);
> +
> +	netif_stop_queue(netdev);
> +	virtnic_disable(netdev);
> +
> +	LOGINFNAME(netdev, "Closed:%p\n", netdev);
> +
> +	return 0;
> +}
> +
> +static int
> +virtnic_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +/*
> + * Called by kernel when ifconfig up is run.
> + * Returns 0 on success, negative value on failure.
> +*/
> +static int
> +virtnic_open(struct net_device *netdev)
> +{
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +	void *p = (__force void *)netdev->ip_ptr;
> +
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "Opening %p name:%s allocating:%d rcvbufs mtu:%d\n", netdev,
> +		   netdev->name, vnicinfo->num_rcv_bufs, netdev->mtu);
> +
> +	virtnic_enable(netdev);
> +	/* start the interface's transmit queue, allowing it accept
> +	 * packets for transmission
> +	 */
> +	netif_start_queue(netdev);
> +
> +	LOGINFNAME(vnicinfo->netdev,
> +		   "Opened %p netdev->ip_ptr:%p name:%s %02x:%02x:%02x:%02x:%02x:%02x\n",
> +		   netdev, netdev->ip_ptr, netdev->name, netdev->dev_addr[0],
> +		   netdev->dev_addr[1], netdev->dev_addr[2],
> +		   netdev->dev_addr[3], netdev->dev_addr[4],
> +		   netdev->dev_addr[5]);
> +
> +	/*
> +	 * temporary code to see trap to catch if vnic inet addresses
> +	 * are getting trashed
> +	 */
> +	if (p != (__force void *)netdev->ip_ptr) {
> +		LOGERRNAME(vnicinfo->netdev, "***********FAILURE HAPPENED\n");
> +		LOGERRNAME(vnicinfo->netdev, "           Test to catch if vnic inet addresses are getting trashed.\n");
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		schedule_timeout(msecs_to_jiffies(1000));
> +	}
> +	return 0;
> +}
> +
> +static inline int
> +repost_return(
> +	struct uiscmdrsp *cmdrsp,
> +	struct virtnic_info *vnicinfo,
> +	struct sk_buff *skb,
> +	struct net_device *netdev)
> +{
> +	struct net_pkt_rcv copy;
> +	int i = 0, cc, numreposted;
> +	int found_skb = 0;
> +	int status = 0;
> +
> +	copy = cmdrsp->net.rcv;
> +	LOGVER("REPOST_RETURN: realloc rcv skbs to replace:%d rcvbufs\n",
> +	       copy.numrcvbufs);
> +	switch (copy.numrcvbufs) {
> +	case 0:
> +		vnicinfo->n_rcv0++;
> +		break;
> +	case 1:
> +		vnicinfo->n_rcv1++;
> +		break;
> +	case 2:
> +		vnicinfo->n_rcv2++;
> +		break;
> +	default:
> +		vnicinfo->n_rcvx++;
> +		break;
> +	}
> +	for (cc = 0, numreposted = 0; cc < copy.numrcvbufs; cc++) {
> +		for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
> +			if (vnicinfo->rcvbuf[i] != copy.rcvbuf[cc])
> +				continue;
> +
> +			LOGVER("REPOST_RETURN: orphaning old rcvbuf[%d]:%p cc=%d",
> +			       i, vnicinfo->rcvbuf[i], cc);
> +			vnicinfo->found_repost_rcvbuf_cnt++;
> +			if ((skb) && vnicinfo->rcvbuf[i] == skb) {
> +				found_skb = 1;
> +				vnicinfo->repost_found_skb_cnt++;
> +			}
> +			vnicinfo->rcvbuf[i] = alloc_rcv_buf(netdev);
> +			if (!vnicinfo->rcvbuf[i]) {
> +				LOGVER("**** %s FAILED to reallocate new rcv buf - no REPOST, found_skb=%d, cc=%d, i=%d\n",
> +				       netdev->name, found_skb, cc, i);
> +				vnicinfo->num_rcv_bufs_could_not_alloc++;
> +				vnicinfo->alloc_failed_in_repost_return_cnt++;
> +				status = -1;
> +				break;
> +			}
> +			LOGVER("REPOST_RETURN: reposting new rcvbuf[%d]:%p\n",
> +			       i, vnicinfo->rcvbuf[i]);
> +			post_skb(cmdrsp, vnicinfo, vnicinfo->rcvbuf[i]);
> +			numreposted++;
> +			break;
> +		}
> +	}
> +	LOGVER("REPOST_RETURN: num rcvbufs posted:%d\n", numreposted);
> +	if (numreposted != copy.numrcvbufs) {
> +		LOGVER("**** %s FAILED to repost all the rcv bufs; numreposted:%d rcv.numrcvbufs:%d\n",
> +		       netdev->name, numreposted, copy.numrcvbufs);
> +		vnicinfo->n_repost_deficit++;
> +		status = -1;
> +	}
> +	if (skb) {
> +		if (found_skb) {
> +			LOGVER("REPOST_RETURN: skb is %p - freeing it", skb);
> +			kfree_skb(skb);
> +		} else {
> +			LOGERRNAME(vnicinfo->netdev, "%s REPOST_RETURN: skb %p NOT found in rcvbuf list!!",
> +				   netdev->name, skb);
> +			status = -3;
> +			vnicinfo->bad_rcv_buf++;
> +		}
> +	}
> +	atomic_dec(&vnicinfo->usage);
> +	return status;
> +}
> +
> +static void
> +virtnic_rx(struct uiscmdrsp *cmdrsp)
> +{
> +	struct virtnic_info *vnicinfo;
> +	struct sk_buff *skb, *prev, *curr;
> +	struct net_device *netdev;
> +	int cc, currsize, off, status;
> +	struct ethhdr *eth;
> +	unsigned long flags;
> +#ifdef DEBUG
> +	struct phys_info testfrags[MAX_PHYS_INFO];
> +#endif
> +
> +/*
> + * post new rcv buf to the other end using the cmdrsp we have at hand
> + * post it without holding lock - but we'll use the signal lock to synchronize
> + * the queue insert the cmdrsp that contains the net.rcv is the one we are
> + * using to repost, so copy the info we need from it.
> + */
> +	skb = cmdrsp->net.buf;
> +	netdev = skb->dev;
> +
> +	if (netdev)
> +		DBGINF("in virtnic_rx %p %s len:%d\n", netdev, netdev->name,
> +		       cmdrsp->net.rcv.rcv_done_len);
> +	else {
> +		/* We must have previously downed this network device and
> +		 * this skb and device is no longer valid. This also means
> +		 * the skb reference was removed from virtnic->rcvbuf so no
> +		 * need to search for it.
> +		 * All we can do is free the skb and return.
> +		 * Note: We crash if we try to log this here.
> +		 */
> +		kfree_skb(skb);
> +		return;
> +	}
> +
> +	vnicinfo = netdev_priv(netdev);
> +
> +	spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +	atomic_dec(&vnicinfo->num_rcv_bufs_in_iovm);
> +
> +	/* update rcv stats - call it with priv_lock held */
> +	UPD_RCV_STATS;
> +
> +	atomic_inc(&vnicinfo->usage);	/* don't want a close to happen before
> +					   we're done here */
> +	/*
> +	 * set length to how much was ACTUALLY received -
> +	 * NOTE: rcv_done_len includes actual length of data rcvd
> +	 * including ethhdr
> +	 */
> +	skb->len = cmdrsp->net.rcv.rcv_done_len;
> +
> +	/* test enabled while holding lock */
> +	if (!(vnicinfo->enabled && vnicinfo->enab_dis_acked)) {
> +		/*
> +		 * don't process it unless we're in enable mode and until
> +		 * we've gotten an ACK saying the other end got our RCV enable
> +		 */
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "%s dropping packet - perhaps old\n", netdev->name);
> +		spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +		if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
> +			LOGERRNAME(vnicinfo->netdev, "repost_return failed");
> +		return;
> +	}
> +
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +	/*
> +	 * when skb was allocated, skb->dev, skb->data, skb->len and
> +	 * skb->data_len were setup. AND, data has already put into the
> +	 * skb (both first frag and in frags pages)
> +	 * NOTE: firstfragslen is the amount of data in skb->data and that
> +	 * which is not in nr_frags or frag_list. This is now simply
> +	 * RCVPOST_BUF_SIZE. bump tail to show how much data is in
> +	 * firstfrag & set data_len to show rest see if we have to chain
> +	 * frag_list.
> +	 */
> +	if (skb->len > RCVPOST_BUF_SIZE) {	/* do PRECAUTIONARY check */
> +		if (cmdrsp->net.rcv.numrcvbufs < 2) {
> +			LOGERRNAME(vnicinfo->netdev, "**** %s Something is wrong; rcv_done_len:%d > RCVPOST_BUF_SIZE:%d but numrcvbufs:%d < 2\n",
> +				   netdev->name, skb->len, RCVPOST_BUF_SIZE,
> +				   cmdrsp->net.rcv.numrcvbufs);
> +			if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
> +				LOGERRNAME(vnicinfo->netdev,
> +					   "repost_return failed");
> +			return;
> +		}
> +		/* length rcvd is greater than firstfrag in this skb rcv buf  */
> +		skb->tail += RCVPOST_BUF_SIZE;	/* amount in skb->data */
> +		skb->data_len = skb->len - RCVPOST_BUF_SIZE;	/* amount that
> +								   will be in
> +								   frag_list */
> +		DBGINF("len:%d data:%d\n", skb->len, skb->data_len);
> +	} else {
> +		/*
> +		 * data fits in this skb - no chaining - do PRECAUTIONARY check
> +		 */
> +		if (cmdrsp->net.rcv.numrcvbufs != 1) {	/* should be 1 */
> +			LOGERRNAME(vnicinfo->netdev, "**** %s Something is wrong; rcv_done_len:%d <= RCVPOST_BUF_SIZE:%d but numrcvbufs:%d != 1\n",
> +				   netdev->name, skb->len, RCVPOST_BUF_SIZE,
> +				   cmdrsp->net.rcv.numrcvbufs);
> +			if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
> +				LOGERRNAME(vnicinfo->netdev,
> +					   "repost_return failed");
> +			return;
> +		}
> +		skb->tail += skb->len;
> +		skb->data_len = 0;	/* nothing rcvd in frag_list */
> +	}
> +	off = skb_tail_pointer(skb) - skb->data;
> +	/*
> +	 * amount we bumped tail by in the head skb
> +	 * it is used to calculate the size of each chained skb below
> +	 * it is also used to index into bufline to continue the copy
> +	 * (for chansocktwopc)
> +	 * if necessary chain the rcv skbs together.
> +	 * NOTE: index 0 has the same as cmdrsp->net.rcv.skb; we need to
> +	 * chain the rest to that one.
> +	 * - do PRECAUTIONARY check
> +	 */
> +	if (cmdrsp->net.rcv.rcvbuf[0] != skb) {
> +		LOGERRNAME(vnicinfo->netdev, "**** %s Something is wrong; rcvbuf[0]:%p != skb:%p\n",
> +			   netdev->name, cmdrsp->net.rcv.rcvbuf[0], skb);
> +		if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
> +			LOGERRNAME(vnicinfo->netdev, "repost_return failed");
> +		return;
> +	}
> +
> +	if (cmdrsp->net.rcv.numrcvbufs > 1) {
> +		/* chain the various rcv buffers into the skb's frag_list. */
> +		/* Note: off was initialized above  */
> +		for (cc = 1, prev = NULL;
> +		     cc < cmdrsp->net.rcv.numrcvbufs; cc++) {
> +			curr = (struct sk_buff *)cmdrsp->net.rcv.rcvbuf[cc];
> +			curr->next = NULL;
> +			DBGINF("chaining skb:%p data:%p to skb:%p data:%p\n",
> +			       curr, curr->data, skb, skb->data);
> +			if (prev == NULL)	/* start of list- set head */
> +				skb_shinfo(skb)->frag_list = curr;
> +			else
> +				prev->next = curr;
> +			prev = curr;
> +			/*
> +			 * should we set skb->len and skb->data_len for each
> +			 * buffer being chained??? can't hurt!
> +			 */
> +			currsize =
> +			    min(skb->len - off,
> +				(unsigned int)RCVPOST_BUF_SIZE);
> +			curr->len = currsize;
> +			curr->tail += currsize;
> +			curr->data_len = 0;
> +			off += currsize;
> +		}
> +#ifdef DEBUG
> +		/* assert skb->len == off */
> +		if (skb->len != off) {
> +			LOGERRNAME(vnicinfo->netdev, "%s something wrong; skb->len:%d != off:%d\n",
> +				   netdev->name, skb->len, off);
> +		}
> +		/* test code */
> +		cc = util_copy_fragsinfo_from_skb("rcvchaintest", skb,
> +						  RCVPOST_BUF_SIZE,
> +						  MAX_PHYS_INFO, testfrags);
> +		LOGINFNAME(vnicinfo->netdev, "rcvchaintest returned:%d\n", cc);
> +		if (cc != cmdrsp->net.rcv.numrcvbufs) {
> +			LOGERRNAME(vnicinfo->netdev, "**** %s Something wrong; rcvd chain length %d different from one we calculated %d\n",
> +				   netdev->name, cmdrsp->net.rcv.numrcvbufs,
> +				   cc);
> +		}
> +		for (i = 0; i < cc; i++) {
> +			LOGINFNAME(vnicinfo->netdev, "test:RCVPOST_BUF_SIZE:%d[%d] pfn:%llu off:0x%x len:%d\n",
> +				   RCVPOST_BUF_SIZE, i, testfrags[i].pi_pfn,
> +				   testfrags[i].pi_off, testfrags[i].pi_len);
> +		}
> +#endif
> +	}
> +
> +	/* set up packet's protocl type using ethernet header - this
> +	 * sets up skb->pkt_type & it also PULLS out the eth header
> +	 */
> +	skb->protocol = eth_type_trans(skb, netdev);
> +
> +	eth = eth_hdr(skb);
> +
> +	DBGINF("%d Src:%02x:%02x:%02x:%02x:%02x:%02x Dest:%02x:%02x:%02x:%02x:%02x:%02x proto:%x\n",
> +	       skb->pkt_type, eth->h_source[0], eth->h_source[1],
> +	       eth->h_source[2], eth->h_source[3], eth->h_source[4],
> +	       eth->h_source[5], eth->h_dest[0], eth->h_dest[1], eth->h_dest[2],
> +	       eth->h_dest[3], eth->h_dest[4], eth->h_dest[5], eth->h_proto);
> +
> +	skb->csum = 0;
> +	skb->ip_summed = CHECKSUM_NONE;	/* trust me, the checksum has
> +					   been verified */
> +
> +	do {
> +		if (netdev->flags & IFF_PROMISC) {
> +			DBGINF("IFF_PROMISC is set.\n");
> +			break;	/* accept all packets */
> +		}
> +		if (skb->pkt_type == PACKET_BROADCAST) {
> +			DBGINF("packet is broadcast.\n");
> +			if (netdev->flags & IFF_BROADCAST) {
> +				DBGINF("IFF_BROADCAST is set.\n");
> +				break;	/* accept all broadcast packets */
> +			}
> +		} else if (skb->pkt_type == PACKET_MULTICAST) {
> +			DBGINF("packet is multicast.\n");
> +			if (netdev->flags & IFF_ALLMULTI)
> +				DBGINF("IFF_ALLMULTI is set.\n");
> +			if ((netdev->flags & IFF_MULTICAST) &&
> +			    (netdev_mc_count(netdev))) {
> +				struct netdev_hw_addr *ha;
> +				int found_mc = 0;
> +
> +				DBGINF("IFF_MULTICAST is set %d.\n",
> +				       netdev_mc_count(netdev));
> +				/*
> +				 * only accept multicast packets that we can
> +				 * find in our multicast address list
> +				 */
> +				netdev_for_each_mc_addr(ha, netdev) {
> +					if (memcmp
> +					    (eth->h_dest, ha->addr,
> +					     MAX_MACADDR_LEN) == 0) {
> +						DBGINF("multicast address is in our list at index:%i.\n", i);
> +						found_mc = 1;
> +						break;
> +					}
> +				}
> +				if (found_mc) {
> +					break;	/* accept packet, dest
> +						   matches a multicast
> +						   address */
> +				}
> +			}
> +		} else if (skb->pkt_type == PACKET_HOST) {
> +			DBGINF("packet is directed.\n");
> +			break;	/* accept packet, h_dest must match vnic
> +				   mac address */
> +		} else if (skb->pkt_type == PACKET_OTHERHOST) {
> +			/* something is not right */
> +			LOGERRNAME(vnicinfo->netdev, "**** FAILED to deliver rcv packet to OS; name:%s Dest:%02x:%02x:%02x:%02x:%02x:%02x VNIC:%02x:%02x:%02x:%02x:%02x:%02x\n",
> +				   netdev->name, eth->h_dest[0], eth->h_dest[1],
> +				   eth->h_dest[2], eth->h_dest[3],
> +				   eth->h_dest[4], eth->h_dest[5],
> +				   netdev->dev_addr[0], netdev->dev_addr[1],
> +				   netdev->dev_addr[2], netdev->dev_addr[3],
> +				   netdev->dev_addr[4], netdev->dev_addr[5]);
> +		}
> +		/* drop packet - don't forward it up to OS */
> +		DBGINF("we cannot indicate this recv pkt! (netdev->flags:0x%04x, skb->pkt_type:0x%02x).\n",
> +		       netdev->flags, skb->pkt_type);
> +		vnicinfo->n_rcv_packet_not_accepted++;
> +		if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
> +			LOGERRNAME(vnicinfo->netdev, "repost_return failed");
> +		return;
> +	} while (0);
> +
> +	DBGINF("Calling netif_rx skb:%p head:%p end:%p data:%p tail:%p len:%d data_len:%d skb->nr_frags:%d\n",
> +	       skb, skb->head, skb->end, skb->data, skb->tail, skb->len,
> +	       skb->data_len, skb_shinfo(skb)->nr_frags);
> +
> +	status = netif_rx(skb);
> +	if (status != NET_RX_SUCCESS)
> +		LOGWRNNAME(vnicinfo->netdev, "status=%d\n", status);
> +	/*
> +	 * netif_rx returns various values, but "in practice most drivers
> +	 * ignore the return value
> +	 */
> +
> +	skb = NULL;
> +	/*
> +	 * whether the packet got dropped or handled, the skb is freed by
> +	 * kernel code, so we shouldn't free it. but we should repost a
> +	 * new rcv buffer.
> +	 */
> +	if (repost_return(cmdrsp, vnicinfo, skb, netdev) < 0)
> +		LOGVER("repost_return failed");
> +	return;
> +}
> +
> +/*
> + * This function is protected from concurrent calls by a spinlock xmit_lock
> + * in the  net_device struct, but as soon as the function returns it can be
> + * called again.
> + * Return 0, OK, !0 for error.
> + */
> +static int
> +virtnic_xmit(struct sk_buff *skb, struct net_device *netdev)
> +{
> +	struct virtnic_info *vnicinfo;
> +	int len, firstfraglen, padlen;
> +	struct uiscmdrsp *cmdrsp = NULL;
> +	unsigned long flags;
> +	int qrslt;
> +
> +/* Note: NETDEV_TX_OK is 0, NETDEV_TX_BUSY is 1. */
> +#define BUSY { \
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags); \
> +	vnicinfo->busy_cnt++; \
> +	return NETDEV_TX_BUSY; \
> +}
> +
> +/* return value NETDEV_TX_OK == 0 */
> +	DBGINF("got xmit for netdev:%p %s len:%d ip_summed:%d skb->data:%p data_len:%d skb->h.raw:%p maxdatalen:%d\n",
> +	       netdev, netdev->name, skb->len, skb->ip_summed, skb->data,
> +	       skb->data_len, skb->h.raw, skb->end - skb->data);
> +
> +	vnicinfo = netdev_priv(netdev);
> +	spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +	/*Modified for Trac #2395 FIX TEL_CKS */
> +	if (netif_queue_stopped(netdev)) {
> +		LOGINFNAME(vnicinfo->netdev,
> +			   "Returning Busy because queue is stopped\n");
> +		BUSY;
> +	}
> +	if (vnicinfo->server_down || vnicinfo->server_change_state) {
> +		LOGINFNAME(vnicinfo->netdev, "Returning BUSY because server is down/changing state\n");
> +		BUSY;
> +	}
> +	/*
> +	 * sk_buff struct is used to host network data throughout all the
> +	 * Linux network subsystems
> +	 */
> +	len = skb->len;
> +	/*
> +	 * skb->len is the FULL length of data (including fragmentary portion)
> +	 * skb->data_len is the length of the fragment portion in frags
> +	 * skb->len - skb->data_len is the size of the 1st fragment in skb->data
> +	 * calculate the length of the first fragment that skb->data is
> +	 * pointing to
> +	 */
> +	firstfraglen = skb->len - skb->data_len;
> +	if (firstfraglen < ETH_HEADER_SIZE) {
> +		LOGERRNAME(vnicinfo->netdev, "first fragment in skb->data too small for ethernet header len:%d data_len:%d\n",
> +			   skb->len, skb->data_len);
> +		BUSY;		/* NOT LIKELY TO HAPPEN */
> +	}
> +
> +	if ((len < ETH_MIN_PACKET_SIZE) &&
> +	    ((skb_end_pointer(skb) - skb->data) >= ETH_MIN_PACKET_SIZE)) {
> +		/* pad the packet out to minimum size */
> +		padlen = ETH_MIN_PACKET_SIZE - len;
> +		DBGINF("padding %d\n", padlen);
> +		memset(&skb->data[len], 0, padlen);
> +		skb->tail += padlen;
> +		skb->len += padlen;
> +		len += padlen;
> +		firstfraglen += padlen;
> +	}
> +
> +	cmdrsp = vnicinfo->xmit_cmdrsp;
> +	/* clear cmdrsp */
> +	memset(cmdrsp, 0, SIZEOF_CMDRSP);
> +	cmdrsp->net.type = NET_XMIT;
> +	cmdrsp->cmdtype = CMD_NET_TYPE;
> +
> +	/* save the pointer to skb - we'll need it for completion */
> +	cmdrsp->net.buf = skb;
> +
> +	if (((vnicinfo->datachan.chstat.sent_xmit >=
> +	      vnicinfo->datachan.chstat.got_xmit_done) &&
> +	     (vnicinfo->datachan.chstat.sent_xmit -
> +	     vnicinfo->datachan.chstat.got_xmit_done >=
> +	     vnicinfo->max_outstanding_net_xmits)) ||
> +	    /* OR check wrap condition */
> +	    ((vnicinfo->datachan.chstat.sent_xmit <
> +	      vnicinfo->datachan.chstat.got_xmit_done) &&
> +	      (ULONG_MAX - vnicinfo->datachan.chstat.got_xmit_done +
> +	       vnicinfo->datachan.chstat.sent_xmit >=
> +	       vnicinfo->max_outstanding_net_xmits))
> +	    ) {
> +		/*
> +		 * too many NET_XMITs queued over to IOVM - need to wait
> +		 * Might need to remove the below message as these might be
> +		 * excessive under load.
> +		 */
> +		vnicinfo->datachan.chstat.reject_count++;
> +		if (!vnicinfo->queuefullmsg_logged &&
> +		    ((vnicinfo->datachan.chstat.reject_count & 0x3ff) ==
> +			1)) {
> +			vnicinfo->queuefullmsg_logged = 1;
> +#if VIRTNIC_STATS
> +			vnicinfo->datachan.chstat.reject_jiffies_start =
> +			    jiffies;
> +#endif
> +			LOGINFNAME(vnicinfo->netdev, "**** REJECTING NET_XMIT - rejected count=%ld chstat.sent_xmit=%lu chstat.got_xmit_done=%lu\n",
> +				   vnicinfo->datachan.chstat.reject_count,
> +				   vnicinfo->datachan.chstat.sent_xmit,
> +				   vnicinfo->datachan.chstat.got_xmit_done);
> +		}
> +		netif_stop_queue(netdev);	/* calling stop queue */
> +		BUSY;		/* return status that packet not accepted */
> +	} else if (vnicinfo->queuefullmsg_logged) {
> +#if VIRTNIC_STATS
> +		LOGINFNAME(vnicinfo->netdev, "**** NET_XMITs now working again - rejected count = %ld msec = %ld\n",
> +			   vnicinfo->datachan.chstat.reject_count,
> +			   ((long)jiffies -
> +			   (long)(vnicinfo->datachan.chstat.
> +				    reject_jiffies_start)) * 1000 / HZ);
> +#else
> +		LOGINFNAME(vnicinfo->netdev, "**** NET_XMITs now working again - rejected count = %ld\n",
> +			   vnicinfo->datachan.chstat.reject_count);
> +#endif
> +		/* queue is not blocked so reset the logging flag */
> +		vnicinfo->queuefullmsg_logged = 0;
> +	}
> +
> +	if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
> +		DBGINF("CHECKSUM_HW protocol:%x csum:%x tso_size:%x data:%p h.raw:%p nh.raw:%p\n",
> +		       skb->protocol, skb->csum, skb_shinfo(skb)->tso_size,
> +		       skb->data, skb->h.raw, skb->nh.raw);
> +		cmdrsp->net.xmt.lincsum.valid = 1;
> +		cmdrsp->net.xmt.lincsum.protocol = skb->protocol;
> +		if (skb_transport_header(skb) > skb->data) {
> +			cmdrsp->net.xmt.lincsum.hrawoff =
> +				skb_transport_header(skb) - skb->data;
> +			cmdrsp->net.xmt.lincsum.hrawoffv = 1;
> +		}
> +		if (skb_network_header(skb) > skb->data) {
> +			cmdrsp->net.xmt.lincsum.nhrawoff =
> +			    skb_network_header(skb) - skb->data;
> +			cmdrsp->net.xmt.lincsum.nhrawoffv = 1;
> +		}
> +		cmdrsp->net.xmt.lincsum.csum = skb->csum;
> +		} else {
> +		cmdrsp->net.xmt.lincsum.valid = 0;
> +		}
> +	/* save off the length of the entire data packet  */
> +	 cmdrsp->net.xmt.len = len;	/* total data length */
> +	/*
> +	 * copy ethernet header from first frag into cmdrsp
> +	 * - everything else will be passed in frags & DMA'ed
> +	 */
> +	memcpy(cmdrsp->net.xmt.ethhdr, skb->data, ETH_HEADER_SIZE);
> +	/*
> +	 * copy frags info - from skb->data we need to only provide access
> +	 * beyond eth header
> +	 */
> +	cmdrsp->net.xmt.num_frags =
> +	    uisutil_copy_fragsinfo_from_skb("virtnic_xmit", skb, firstfraglen,
> +					    MAX_PHYS_INFO,
> +					    cmdrsp->net.xmt.frags);
> +	if (cmdrsp->net.xmt.num_frags == -1) {
> +		LOGERRNAME(vnicinfo->netdev, "**** FAILED to copy fragsinfo\n");
> +		BUSY;		/* WILL HAPPEN ONLY IF FRAG ARRAY WITH
> +				   MAX_PHYS_INFO ENTRIES IS NOT ENOUGH */
> +	}
> +
> +	DBGINF("Forwarding packet cmdrsp:%p\n", cmdrsp);
> +
> +	/*
> +	 * don't hold lock when forwarding xmit - if queue is full insert
> +	 * might sleep
> +	 */
> +	qrslt = uisqueue_put_cmdrsp_with_lock_client(
> +			vnicinfo->datachan.chinfo.queueinfo, cmdrsp,
> +			IOCHAN_TO_IOPART,
> +			(void *)&vnicinfo->datachan.chinfo.insertlock,
> +			DONT_ISSUE_INTERRUPT, (uint64_t)NULL,
> +			0 /* don't wait */ ,
> +			"vnic");
> +	if (!qrslt) {
> +		/* failed to queue xmit - return busy */
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "**** FAILED to insert NET_XMIT\n");
> +		netif_stop_queue(netdev);	/* calling stop queue  */
> +		BUSY;		/* return status that packet not accepted */
> +	}
> +	/* Track the skbs that have been sent to the IOVM for XMIT */
> +	skb_queue_head(&vnicinfo->xmitbufhead, skb);
> +
> +	/*
> +	 * set the last transmission start time
> +	 * linux docs says:  Do not forget to update netdev->trans_start to
> +	 * jiffies after each new tx packet is given to the hardware.
> +	 */
> +	netdev->trans_start = jiffies;	/* some code in Linux uses this. */
> +
> +	/* update xmt stats */
> +	UPD_XMT_STATS;
> +	vnicinfo->datachan.chstat.sent_xmit++;
> +
> +	/*
> +	 * check to see if we have hit the high watermark for
> +	 * netif_stop_queue()
> +	 */
> +	if (((vnicinfo->datachan.chstat.sent_xmit >=
> +	      vnicinfo->datachan.chstat.got_xmit_done) &&
> +	     (vnicinfo->datachan.chstat.sent_xmit -
> +	      vnicinfo->datachan.chstat.got_xmit_done >=
> +	      vnicinfo->upper_threshold_net_xmits)) ||
> +	    /* OR check wrap condition */
> +	    ((vnicinfo->datachan.chstat.sent_xmit <
> +	      vnicinfo->datachan.chstat.got_xmit_done) &&
> +	      (ULONG_MAX - vnicinfo->datachan.chstat.got_xmit_done +
> +	       vnicinfo->datachan.chstat.sent_xmit >=
> +	       vnicinfo->upper_threshold_net_xmits))
> +	   ) {
> +		/* too many NET_XMITs queued over to IOVM - need to wait */
> +		netif_stop_queue(netdev); /* calling stop queue - call
> +					     netif_wake_queue() after lower
> +					     threshold */
> +		vnicinfo->flow_control_upper_hits++;
> +	}
> +
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +	/* skb will be freed when we get back NET_XMIT_DONE */
> +	return NETDEV_TX_OK;
> +}
> +
> +static void
> +virtnic_serverdown_complete(struct work_struct *work)
> +{
> +	struct virtnic_info *vnicinfo;
> +	struct net_device *netdev;
> +	struct virtpci_dev *virtpcidev;
> +	unsigned long flags;
> +	int i = 0, count = 0;
> +
> +	vnicinfo =
> +	    container_of(work, struct virtnic_info, serverdown_completion);
> +	netdev = vnicinfo->netdev;
> +	virtpcidev = vnicinfo->virtpcidev;
> +
> +	DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>", virtpcidev->busNo,
> +	       virtpcidev->deviceNo);
> +	DBGINF("net_device name<<%s>>", netdev->name);
> +	/* Stop Using Datachan */
> +	uisthread_stop(&vnicinfo->datachan.chinfo.threadinfo);
> +
> +	/* Inform Linux that the link is down */
> +	netif_carrier_off(netdev);
> +	netif_stop_queue(netdev);
> +
> +	/*
> +	 * Free the skb for XMITs that haven't been serviced by the server
> +	 * We shouldn't have to inform Linux about these IOs because they
> +	 * are "lost in the ethernet"
> +	 */
> +	skb_queue_purge(&vnicinfo->xmitbufhead);
> +
> +	spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +	/* free rcv buffers */
> +	for (i = 0; i < vnicinfo->num_rcv_bufs; i++) {
> +		if (vnicinfo->rcvbuf[i]) {
> +			kfree_skb(vnicinfo->rcvbuf[i]);
> +			vnicinfo->rcvbuf[i] = NULL;
> +			count++;
> +		}
> +	}
> +	atomic_set(&vnicinfo->num_rcv_bufs_in_iovm, 0);
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +	LOGINFNAME(vnicinfo->netdev, "Closed:%p Freed %d rcv bufs\n", netdev,
> +		   count);
> +
> +	vnicinfo->server_down = true;
> +	vnicinfo->server_change_state = false;
> +	visorchipset_device_pause_response(virtpcidev->bus_no,
> +					   virtpcidev->device_no, 0);
> +}
> +
> +/* As per VirtpciFunc returns 1 for success and 0 for failure */
> +static int
> +virtnic_serverdown(struct virtpci_dev *virtpcidev, u32 state)
> +{
> +	struct net_device *netdev = virtpcidev->net.netdev;
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +
> +	DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>", virtpcidev->busNo,
> +	       virtpcidev->deviceNo);
> +	DBGINF("entering virtnic_serverdown");
> +
> +	if (!vnicinfo->server_down && !vnicinfo->server_change_state) {
> +		vnicinfo->server_change_state = true;
> +		queue_work(virtnic_serverdown_workqueue,
> +			   &vnicinfo->serverdown_completion);
> +	} else if (vnicinfo->server_change_state) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "Server already processing change state message.");
> +		return 0;
> +	} else
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "Server already down, but another server down message received.");
> +	DBGINF("exiting virtnic_serverdown");
> +	return 1;
> +}
> +
> +/* As per VirtpciFunc returns 1 for success and 0 for failure */
> +static int
> +virtnic_serverup(struct virtpci_dev *virtpcidev)
> +{
> +	struct net_device *netdev = virtpcidev->net.netdev;
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +	unsigned long flags;
> +
> +	DBGINF("entering virtnic_serverup");
> +	DBGINF("virtpcidev busNo<<%d>>devNo<<%d>>", virtpcidev->busNo,
> +	       virtpcidev->deviceNo);
> +	DBGINF("net_device name<<%s>>", netdev->name);
> +	if (vnicinfo->server_down && !vnicinfo->server_change_state) {
> +		vnicinfo->server_change_state = true;
> +		/*
> +		 * Must transition channel to ATTACHED state BEFORE we can
> +		 * start using the device again
> +		 */
> +		SPAR_CHANNEL_CLIENT_TRANSITION(vnicinfo->datachan.chinfo.
> +					       queueinfo->chan,
> +					       dev_name(&virtpcidev->
> +							generic_dev),
> +					       CHANNELCLI_ATTACHED, NULL);
> +
> +		if (!uisthread_start(&vnicinfo->datachan.chinfo.threadinfo,
> +				     process_incoming_rsps,
> +				     &vnicinfo->datachan, "vnic_incoming")) {
> +			LOGERRNAME(vnicinfo->netdev,
> +				   "**** FAILED to start thread\n");
> +			return 0;
> +		}
> +
> +		init_rcv_bufs(netdev, vnicinfo);
> +
> +		spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +		vnicinfo->enabled = 1;
> +		/*
> +		 * now we're ready, let's send an ENB to uisnic
> +		 * but until we get an ACK back from uisnic, we'll drop
> +		 * the packets
> +		 */
> +		vnicinfo->enab_dis_acked = 0;
> +		spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +		/*
> +		 * send enable and wait for ack - don't hold lock when
> +		 * sending enable because if the queue is full, insert
> +		 * might sleep.
> +		 */
> +		SEND_ENBDIS(netdev, 1, vnicinfo->cmdrsp_rcv,
> +			    vnicinfo->datachan.chinfo.queueinfo,
> +			    &vnicinfo->datachan.chinfo.insertlock,
> +			    vnicinfo->datachan.chstat);
> +	} else if (vnicinfo->server_change_state) {
> +		LOGERRNAME(vnicinfo->netdev,
> +			   "Server already processing change state message.");
> +		return 0;
> +	} else {
> +		DBGINF("Server up message received for server that was already up.");
> +	}
> +	DBGINF("exiting virtnic_serverup");
> +	return 1;
> +}
> +
> +static void
> +virtnic_timeout_reset(struct work_struct *work)
> +{
> +	struct virtnic_info *vnicinfo;
> +	struct net_device *netdev;
> +	struct virtpci_dev *virtpcidev;
> +	int response = 0;
> +
> +	vnicinfo = container_of(work, struct virtnic_info, timeout_reset);
> +	netdev = vnicinfo->netdev;
> +
> +	DBGINF("net_device name<<%s>>", netdev->name);
> +	/* Transmit Timeouts are typically handled by resetting the
> +	 * device for our virtual NIC we will send a Disable and
> +	 * Enable to the IOVM.  If it doesn't respond we will trigger
> +	 * a serverdown
> +	 */
> +	DBGINF("Disabling connection to server.\n");
> +	netif_stop_queue(netdev);
> +	response = virtnic_disable_with_timeout(netdev, 100);
> +	if (response != 0)
> +		goto call_serverdown;
> +
> +	DBGINF("Disable returned so reenable connection to server.\n");
> +	response = virtnic_enable_with_timeout(netdev, 100);
> +	if (response != 0)
> +		goto call_serverdown;
> +	netif_wake_queue(netdev);
> +
> +	LOGWRNNAME(vnicinfo->netdev, "Virtual connection reset.\n");
> +	return;
> +
> +call_serverdown:
> +	LOGERRNAME(vnicinfo->netdev,
> +		   "Disable/enabled Pair failed to return so start serverdown.\n");
> +	virtpcidev = vnicinfo->virtpcidev;
> +	virtnic_serverdown(virtpcidev, 0);
> +	return;
> +}
> +
> +static void
> +virtnic_xmit_timeout(struct net_device *netdev)
> +{
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +	unsigned long flags;
> +
> +	LOGWRNNAME(vnicinfo->netdev,
> +		   "Transmit Timeout.  Resetting virtual connection.\n");
> +	LOGWRNNAME(vnicinfo->netdev, "net_device name<<%s>>", netdev->name);
> +
> +	spin_lock_irqsave(&vnicinfo->priv_lock, flags);
> +	/* Ensure that a ServerDown message hasn't been received */
> +	if (!vnicinfo->enabled ||
> +	    (vnicinfo->server_down && !vnicinfo->server_change_state)) {
> +		spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +		return;
> +	}
> +	spin_unlock_irqrestore(&vnicinfo->priv_lock, flags);
> +
> +	queue_work(virtnic_timeout_reset_workqueue, &vnicinfo->timeout_reset);
> +}
> +
> +static void
> +virtnic_set_multi(struct net_device *netdev)
> +{
> +	struct uiscmdrsp *cmdrsp;
> +	struct virtnic_info *vnicinfo = netdev_priv(netdev);
> +
> +	DBGINF("net_device name<<%s>>", netdev->name);
> +	DBGINF("entering virtnic_set_multi\n");
> +
> +	/* any filtering changes? */
> +	if (vnicinfo->old_flags != netdev->flags) {
> +		LOGINFNAME(vnicinfo->netdev,
> +			   "old filter = 0x%04x, new filter = 0x%04x.\n",
> +			   vnicinfo->old_flags, netdev->flags);
> +		if ((netdev->flags & IFF_PROMISC) !=
> +		    (vnicinfo->old_flags & IFF_PROMISC)) {
> +			LOGINFNAME(vnicinfo->netdev,
> +				   "we are %s promiscuous mode.\n",
> +				   (netdev->
> +				    flags & IFF_PROMISC) ? "entering" :
> +				   "exiting");
> +			cmdrsp = kmalloc(SIZEOF_CMDRSP, GFP_ATOMIC);
> +			if (cmdrsp == NULL) {
> +				LOGERRNAME(vnicinfo->netdev,
> +					   "**** FAILED to kmalloc cmdrsp.\n");
> +				return;
> +			}
> +			memset(cmdrsp, 0, SIZEOF_CMDRSP);
> +			cmdrsp->cmdtype = CMD_NET_TYPE;
> +			cmdrsp->net.type = NET_RCV_PROMISC;
> +			cmdrsp->net.enbdis.context = netdev;
> +			cmdrsp->net.enbdis.enable =
> +			    (netdev->flags & IFF_PROMISC);
> +			if (uisqueue_put_cmdrsp_with_lock_client
> +			    (vnicinfo->datachan.chinfo.queueinfo, cmdrsp,
> +			     IOCHAN_TO_IOPART,
> +			     (void *)&vnicinfo->datachan.chinfo.insertlock,
> +			     DONT_ISSUE_INTERRUPT, (uint64_t)NULL,
> +			     0 /* don't wait */ , "vnic")) {
> +				vnicinfo->datachan.chstat.sent_promisc++;
> +			} else
> +				LOGERRNAME(vnicinfo->netdev,
> +					   "**** FAILED to insert NET_RCV_PROMISC.\n");
> +			kfree(cmdrsp);
> +		}
> +
> +		vnicinfo->old_flags = netdev->flags;
> +	}
> +	DBGINF("exiting virtnic_set_multi\n");
> +}
> +
> +/*****************************************************/
> +/* debugfs filesystem functions			     */
> +/*****************************************************/
> +
> +static ssize_t info_debugfs_read(struct file *file,
> +				 char __user *buf, size_t len, loff_t *offset)
> +{
> +	int i;
> +	ssize_t bytes_read = 0;
> +	int str_pos = 0;
> +	struct virtnic_info *vni;
> +	char *vbuf;
> +
> +	if (len > MAX_BUF)
> +		len = MAX_BUF;
> +	vbuf = kzalloc(len, GFP_KERNEL);
> +	if (!vbuf)
> +		return -ENOMEM;
> +
> +	/* for each vnic channel
> +	 * dump out channel specific data
> +	 */
> +	for (i = 0; i < VIRTNICSOPENMAX; i++) {
> +		if (num_virtnic_open[i].netdev == NULL)
> +			continue;
> +
> +		vni = num_virtnic_open[i].vnicinfo;
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, "Vnic i = %d\n", i);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, "netdev = %s (0x%p), MAC Addr: %02x:%02x:%02x:%02x:%02x:%02x\n",
> +			num_virtnic_open[i].netdev->name,
> +			num_virtnic_open[i].netdev,
> +			num_virtnic_open[i].netdev->dev_addr[0],
> +			num_virtnic_open[i].netdev->dev_addr[1],
> +			num_virtnic_open[i].netdev->dev_addr[2],
> +			num_virtnic_open[i].netdev->dev_addr[3],
> +			num_virtnic_open[i].netdev->dev_addr[4],
> +			num_virtnic_open[i].netdev->dev_addr[5]);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, "vnicinfo = 0x%p\n", vni);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " num_rcv_bufs = %d\n",
> +			vni->num_rcv_bufs);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " features = 0x%016llX\n",
> +			(uint64_t)readq(&vni->datachan.chinfo.queueinfo->chan->
> +				features));
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " max_outstanding_net_xmits = %d\n",
> +			vni->max_outstanding_net_xmits);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " upper_threshold_net_xmits = %d\n",
> +			vni->upper_threshold_net_xmits);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " lower_threshold_net_xmits = %d\n",
> +			vni->lower_threshold_net_xmits);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " queuefullmsg_logged = %d\n",
> +			vni->queuefullmsg_logged);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " queueinfo->packets_sent = %lld\n",
> +			vni->datachan.chinfo.queueinfo->packets_sent);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " queueinfo->packets_received = %lld\n",
> +			vni->datachan.chinfo.queueinfo->packets_received);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.got_rcv = %lu\n",
> +			vni->datachan.chstat.got_rcv);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.got_enbdisack = %lu\n",
> +			vni->datachan.chstat.got_enbdisack);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.got_xmit_done = %lu\n",
> +			vni->datachan.chstat.got_xmit_done);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.xmit_fail = %lu\n",
> +			vni->datachan.chstat.xmit_fail);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.sent_enbdis = %lu\n",
> +			vni->datachan.chstat.sent_enbdis);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.sent_promisc = %lu\n",
> +			vni->datachan.chstat.sent_promisc);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.sent_post = %lu\n",
> +			vni->datachan.chstat.sent_post);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.sent_xmit = %lu\n",
> +			vni->datachan.chstat.sent_xmit);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.reject_count = %lu\n",
> +			vni->datachan.chstat.reject_count);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " chstat.extra_rcvbufs_sent = %lu\n",
> +			vni->datachan.chstat.extra_rcvbufs_sent);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " n_rcv0 = %lu\n", vni->n_rcv0);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " n_rcv1 = %lu\n", vni->n_rcv1);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " n_rcv2 = %lu\n", vni->n_rcv2);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " n_rcvx = %lu\n", vni->n_rcvx);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " num_rcv_bufs_in_iovm = %d\n",
> +			atomic_read(&vni->num_rcv_bufs_in_iovm));
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " alloc_failed_in_if_needed_cnt = %lu\n",
> +			vni->alloc_failed_in_if_needed_cnt);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " alloc_failed_in_repost_return_cnt = %lu\n",
> +			vni->alloc_failed_in_repost_return_cnt);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " inner_loop_limit_reached_cnt = %lu\n",
> +			vni->inner_loop_limit_reached_cnt);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " found_repost_rcvbuf_cnt = %lu\n",
> +			vni->found_repost_rcvbuf_cnt);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " repost_found_skb_cnt = %lu\n",
> +			vni->repost_found_skb_cnt);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " n_repost_deficit = %lu\n",
> +			vni->n_repost_deficit);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " bad_rcv_buf = %lu\n",
> +			vni->bad_rcv_buf);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " n_rcv_packet_not_accepted = %lu\n",
> +			vni->n_rcv_packet_not_accepted);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " interrupts_rcvd = %llu\n",
> +			vni->interrupts_rcvd);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " interrupts_notme = %llu\n",
> +			vni->interrupts_notme);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " interrupts_disabled = %llu\n",
> +			vni->interrupts_disabled);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " busy_cnt = %llu\n",
> +			vni->busy_cnt);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " flow_control_upper_hits = %llu\n",
> +			vni->flow_control_upper_hits);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " flow_control_lower_hits = %llu\n",
> +			vni->flow_control_lower_hits);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " thread_wait_ms = %d\n",
> +			vni->thread_wait_ms);
> +		str_pos += scnprintf(vbuf + str_pos,
> +				len - str_pos, " netif_queue = %s\n",
> +			netif_queue_stopped(vni->netdev) ?
> +			"stopped" : "running");
> +	}
> +	bytes_read = simple_read_from_buffer(buf, len, offset, vbuf, str_pos);
> +	kfree(vbuf);
> +	return bytes_read;
> +}
> +
> +static ssize_t enable_ints_write(struct file *file,
> +				 const char __user *buffer,
> +				 size_t count, loff_t *ppos)
> +{
> +	char buf[4];
> +	int i, new_value;
> +	struct virtnic_info *vnicinfo;
> +	uint64_t __iomem *features_addr;
> +	uint64_t mask;
> +
> +	if (count >= ARRAY_SIZE(buf))
> +		return -EINVAL;
> +
> +	buf[count] = '\0';
> +	if (copy_from_user(buf, buffer, count)) {
> +		LOGERR("copy_from_user failed.\n");
> +		return -EFAULT;
> +	}
> +
> +	i = kstrtoint(buf, 10 , &new_value);
> +
> +	if (i != 0) {
> +		LOGERR("Failed to scan value for enable_ints, buf<<%.*s>>",
> +		       (int)count, buf);
> +		return -EFAULT;
> +	}
> +
> +	 /* set all counts to new_value usually 0 */
> +	for (i = 0; i < VIRTNICSOPENMAX; i++) {
> +		if (num_virtnic_open[i].vnicinfo != NULL) {
> +			vnicinfo = num_virtnic_open[i].vnicinfo;
> +			features_addr =
> +				&vnicinfo->datachan.chinfo.queueinfo->chan->
> +				features;
> +			if (new_value == 1) {
> +				mask =
> +				    ~(ULTRA_IO_CHANNEL_IS_POLLING |
> +				      ULTRA_IO_DRIVER_DISABLES_INTS);
> +				uisqueue_interlocked_and(features_addr, mask);
> +				mask = ULTRA_IO_DRIVER_ENABLES_INTS;
> +				uisqueue_interlocked_or(features_addr, mask);
> +				vnicinfo->thread_wait_ms = 2000;
> +			} else {
> +				mask =
> +					~(ULTRA_IO_DRIVER_ENABLES_INTS |
> +					ULTRA_IO_DRIVER_DISABLES_INTS);
> +				uisqueue_interlocked_and(features_addr, mask);
> +				mask = ULTRA_IO_CHANNEL_IS_POLLING;
> +				uisqueue_interlocked_or(features_addr, mask);
> +				vnicinfo->thread_wait_ms = 2;
> +			}
> +		}
> +}
> +
> +return count;
> +}
> +
> +/*****************************************************/
> +/* Module init & exit functions                      */
> +/*****************************************************/
> +
> +static int __init
> +virtnic_mod_init(void)
> +{
> +	int error, i;
> +
> +	LOGINF("entering virtnic_mod_init");
> +	/* ASSERT RCVPOST_BUF_SIZE < 4K */
> +	if (RCVPOST_BUF_SIZE > PI_PAGE_SIZE) {
> +		LOGERR("**** FAILED RCVPOST_BUF_SIZE:%d larger than a page\n",
> +		       RCVPOST_BUF_SIZE);
> +		return -1;
> +	}
> +	/* ASSERT RCVPOST_BUF_SIZE is big enough to hold eth header */
> +	if (RCVPOST_BUF_SIZE < ETH_HEADER_SIZE) {
> +		LOGERR("**** FAILED RCVPOST_BUF_SIZE:%d is < ETH_HEADER_SIZE:%d\n",
> +		       RCVPOST_BUF_SIZE, ETH_HEADER_SIZE);
> +		return -1;
> +	}
> +
> +	/* clear out array */
> +	for (i = 0; i < VIRTNICSOPENMAX; i++) {
> +		num_virtnic_open[i].netdev = NULL;
> +		num_virtnic_open[i].vnicinfo = NULL;
> +	}
> +	/* create workqueue for serverdown completion */
> +	virtnic_serverdown_workqueue =
> +	    create_singlethread_workqueue("virtnic_serverdown");
> +	if (virtnic_serverdown_workqueue == NULL) {
> +		LOGERR("**** FAILED virtnic_serverdown_workqueue creation\n");
> +		return -1;
> +	}
> +	/* create workqueue for tx timeout reset  */
> +	virtnic_timeout_reset_workqueue =
> +	    create_singlethread_workqueue("virtnic_timeout_reset");
> +	if (virtnic_timeout_reset_workqueue == NULL) {
> +		LOGERR
> +		    ("**** FAILED virtnic_timeout_reset_workqueue creation\n");
> +		return -1;
> +	}
> +	virtnic_debugfs_dir = debugfs_create_dir("virtnic", NULL);
> +	debugfs_create_file("info", S_IRUSR, virtnic_debugfs_dir,
> +			    NULL, &debugfs_info_fops);
> +	debugfs_create_file("enable_ints", S_IWUSR,
> +			    virtnic_debugfs_dir, NULL,
> +			    &debugfs_enable_ints_fops);
> +
> +	error = virtpci_register_driver(&virtnic_driver);
> +	if (error < 0) {
> +		LOGERR("**** FAILED to register driver %x\n", error);
> +		debugfs_remove_recursive(virtnic_debugfs_dir);
> +		return -1;
> +	}
> +	LOGINF("exiting virtnic_mod_init");
> +	return error;
> +}
> +
> +static void __exit
> +virtnic_mod_exit(void)
> +{
> +	LOGINF("entering virtnic_mod_exit...\n");
> +	virtpci_unregister_driver(&virtnic_driver);
> +	/* unregister is going to call virtnic_remove for all devices */
> +	/* destroy serverdown completion workqueue */
> +	if (virtnic_serverdown_workqueue) {
> +		destroy_workqueue(virtnic_serverdown_workqueue);
> +		virtnic_serverdown_workqueue = NULL;
> +	}
> +
> +	/* destroy timeout reset workqueue */
> +	if (virtnic_timeout_reset_workqueue) {
> +		destroy_workqueue(virtnic_timeout_reset_workqueue);
> +		virtnic_timeout_reset_workqueue = NULL;
> +	}
> +
> +	debugfs_remove_recursive(virtnic_debugfs_dir);
> +	LOGINF("exiting virtnic_mod_exit...\n");
> +}
> +
> +module_init(virtnic_mod_init);
> +module_exit(virtnic_mod_exit);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Usha Srinivasan");
> +MODULE_ALIAS("uisvirtnic");
> +/* this is extracted during depmod and kept in modules.dep */

^ permalink raw reply

* Re: [PATCH] bonding: avoid re-entry of bond_release
From: Wengang @ 2014-12-22  8:30 UTC (permalink / raw)
  To: Ding Tianhong, Andy Gospodarek; +Cc: netdev
In-Reply-To: <54977C52.3060309@huawei.com>

OK. Will change as suggested and re-post.

thanks,
wengang

于 2014年12月22日 10:05, Ding Tianhong 写道:
> On 2014/12/22 9:09, Wengang wrote:
>> Hi Andy and Ding,
>>
>> Thanks for your reviews!
>> In the ioctl path, removing a interface that is not currently actually a slave
>> can happen from user space(by mistake), we should avoid the noisy message.
>>
>> While, __bond_release_one() has another call path which is from bond_uninit().
>> In the later case, it should be treated as an error if the interface is not with
>> IFF_SLAVE flag. To notice that error occurred, the message is printed. I think
>> the message is needed for this path.
>>
>> How do you think?
>>
> Just like the bond_enslave(), it is only a warning.
>
> Ding
>
>> thanks,
>> wengang
>>
>> 于 2014年12月21日 10:01, Ding Tianhong 写道:
>>> On 2014/12/19 23:11, Andy Gospodarek wrote:
>>>> On Fri, Dec 19, 2014 at 04:56:57PM +0800, Wengang Wang wrote:
>>>>> If bond_release is run against an interface which is already detached from
>>>>> it's master, then there is an error message shown like
>>>>>      "<master name> cannot release <slave name>".
>>>>>
>>>>> The call path is:
>>>>>      bond_do_ioctl()
>>>>>          bond_release()
>>>>>              __bond_release_one()
>>>>>
>>>>> Though it does not really harm, the message the message is misleading.
>>>>> This patch tries to avoid the message.
>>>>>
>>>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>>>> ---
>>>>>    drivers/net/bonding/bond_main.c | 5 ++++-
>>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>>> index 184c434..4a71bbd 100644
>>>>> --- a/drivers/net/bonding/bond_main.c
>>>>> +++ b/drivers/net/bonding/bond_main.c
>>>>> @@ -3256,7 +3256,10 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
>>>>>            break;
>>>>>        case BOND_RELEASE_OLD:
>>>>>        case SIOCBONDRELEASE:
>>>>> -        res = bond_release(bond_dev, slave_dev);
>>>>> +        if (slave_dev->flags & IFF_SLAVE)
>>>>> +            res = bond_release(bond_dev, slave_dev);
>>>>> +        else
>>>>> +            res = 0;
>>>> Functionally this patch is fine, but I would prefer that you simply
>>>> change the check in __bond_release_one to not be so noisy.  There is a
>>>> check[1] in bond_enslave to see if a slave is already in a bond and that
>>>> just prints a message of netdev_dbg (rather than netdev_err) and it
>>>> seems that would be appropriate for this type of message.
>>>>
>>>> [1] from bond_enslave():
>>>>
>>>>           /* already enslaved */
>>>>           if (slave_dev->flags & IFF_SLAVE) {
>>>>                   netdev_dbg(bond_dev, "Error: Device was already enslaved\n");
>>>>                   return -EBUSY;
>>>>           }
>>>>
>>>>
>>>>>            break;
>>>>>        case BOND_SETHWADDR_OLD:
>>>>>        case SIOCBONDSETHWADDR:
>>>>> -- 
>>> agree ,use netdev_dbg looks more better and enough.
>>>
>>> Ding
>>>
>>>
>>

^ permalink raw reply

* Re: [PATCH net] net/mlx4_en: Doorbell is byteswapped in Little Endian archs
From: Amir Vadai @ 2014-12-22  8:17 UTC (permalink / raw)
  To: Sergei Shtylyov, David S. Miller
  Cc: netdev@vger.kernel.org, Or Gerlitz, Yevgeny Petrilin, Wei Yang,
	David Laight
In-Reply-To: <5497254E.6090603@cogentembedded.com>

On 12/21/2014 9:53 PM, Sergei Shtylyov wrote:
> Hello.
> 
Hi,

> On 12/21/2014 9:18 PM, Amir Vadai wrote:
> 
>> iowrite32() will byteswap it's argument on big endian archs.
>> iowrite32be() will byteswap on little endian archs.
>> Since we don't want to do this unnecessary byteswap on the fast path,
>> doorbell is stored in the NIC's native endianness. Using the right
>> iowrite() according to the arch endianness.
> 
>> CC: Wei Yang <weiyang@linux.vnet.ibm.com>
>> CC: David Laight <david.laight@aculab.com>
>> Fixes: 6a4e812 ("net/mlx4_en: Avoid calling bswap in tx fast path")
>> Signed-off-by: Amir Vadai <amirv@mellanox.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx4/en_tx.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> index a308d41..6477cc7 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> @@ -962,7 +962,16 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>>   		tx_desc->ctrl.owner_opcode = op_own;
>>   		if (send_doorbell) {
>>   			wmb();
>> -			iowrite32(ring->doorbell_qpn,
>> +		/* Since there is no iowrite*_native() that writes the value
>> +		 * as is, without byteswapping - using the one the doesn't do
>> +		 * byteswapping in the relevant arch endianness.
>> +		 */
> 
>     Why the comment is not aligned with the code?
By mistake. Sending V1.

> 
>> +#if defined(__LITTLE_ENDIAN)
>> +			iowrite32(
>> +#else
>> +			iowrite32be(
>> +#endif
> 
>     Ugh...
Yes I think so too, but there is no iowrite*_native(). I plan to send a
patch to add it, but meanwhile the driver is completely broken in little
endian archs - so it must be fixed now. And just reverting to the old
behavior of bswap in the fast path looks like a bad alternative too.

> 
>> +				  ring->doorbell_qpn,
>>   				  ring->bf.uar->map + MLX4_SEND_DOORBELL);
> [...]
> 
> WBR, Sergei
> 
Amir

^ permalink raw reply

* [PATCH net V1] net/mlx4_en: Doorbell is byteswapped in Little Endian archs
From: Amir Vadai @ 2014-12-22  8:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Or Gerlitz, Yevgeny Petrilin, Amir Vadai, Wei Yang,
	David Laight

iowrite32() will byteswap it's argument on big endian archs.
iowrite32be() will byteswap on little endian archs.
Since we don't want to do this unnecessary byteswap on the fast path,
doorbell is stored in the NIC's native endianness. Using the right
iowrite() according to the arch endianness.

CC: Wei Yang <weiyang@linux.vnet.ibm.com>
CC: David Laight <david.laight@aculab.com>
Fixes: 6a4e812 ("net/mlx4_en: Avoid calling bswap in tx fast path")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
Change from V0:
- Fixed indentation of comment

 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index a308d41..e3357bf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -962,7 +962,17 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		tx_desc->ctrl.owner_opcode = op_own;
 		if (send_doorbell) {
 			wmb();
-			iowrite32(ring->doorbell_qpn,
+			/* Since there is no iowrite*_native() that writes the
+			 * value as is, without byteswapping - using the one
+			 * the doesn't do byteswapping in the relevant arch
+			 * endianness.
+			 */
+#if defined(__LITTLE_ENDIAN)
+			iowrite32(
+#else
+			iowrite32be(
+#endif
+				  ring->doorbell_qpn,
 				  ring->bf.uar->map + MLX4_SEND_DOORBELL);
 		} else {
 			ring->xmit_more++;
-- 
1.9.3

^ permalink raw reply related

* Re: virtio_net: Fix napi poll list corruption
From: Jason Wang @ 2014-12-22  8:18 UTC (permalink / raw)
  To: Herbert Xu, David Vrabel
  Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet,
	David S. Miller
In-Reply-To: <20141220002327.GA31975@gondor.apana.org.au>


On 12/20/2014 08:23 AM, Herbert Xu wrote:
> David Vrabel <david.vrabel@citrix.com> wrote:
>> After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt
>> masking in NAPI) the napi instance is removed from the per-cpu list
>> prior to calling the n->poll(), and is only requeued if all of the
>> budget was used.  This inadvertently broke netfront because netfront
>> does not use NAPI correctly.
> A similar bug exists in virtio_net.
>
> -- >8 --
> The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
> interrupt masking in NAPI) breaks virtio_net in an insidious way.
>
> It is now required that if the entire budget is consumed when poll
> returns, the napi poll_list must remain empty.  However, like some
> other drivers virtio_net tries to do a last-ditch check and if
> there is more work it will call napi_schedule and then immediately
> process some of this new work.  Should the entire budget be consumed
> while processing such new work then we will violate the new caller
> contract.
>
> This patch fixes this by not touching any work when we reschedule
> in virtio_net.
>
> The worst part of this bug is that the list corruption causes other
> napi users to be moved off-list.  In my case I was chasing a stall
> in IPsec (IPsec uses netif_rx) and I only belatedly realised that it
> was virtio_net which caused the stall even though the virtio_net
> poll was still functioning perfectly after IPsec stalled.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b8bd719..5ca9771 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -760,7 +760,6 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
>  		container_of(napi, struct receive_queue, napi);
>  	unsigned int r, received = 0;
>  
> -again:
>  	received += virtnet_receive(rq, budget - received);
>  
>  	/* Out of packets? */
> @@ -771,7 +770,6 @@ again:
>  		    napi_schedule_prep(napi)) {
>  			virtqueue_disable_cb(rq->vq);
>  			__napi_schedule(napi);
> -			goto again;
>  		}
>  	}
>
> Cheers,

Acked-by: Jason Wang <jasowang@redhat.com>

btw, looks like at least caif_virtio has the same issue.

^ permalink raw reply

* compensation
From: Barrister Josep @ 2014-12-22  7:43 UTC (permalink / raw)


Greetings,

I am very happy to inform you about my success in getting my
inheritance fund transferred under the cooperation of a new partner
from LONDON (Mr. Sanjay Kumar) presently I am in India for investment
projects with my own share of the total sum. Meanwhile, I didn't
forget your past efforts and attempts to assist me in transferring the
fund despite the fact that it failed us some how but God knows the
best.

Now, I would like to compensate you for your tremendous effort, love
and care for me at that time. Please I want you to contact my new
solicitor in LOME TOGO who assisted me and my Indian partner to
successfully transfer the fund out from Africa to India.

The contacts of the solicitor are as follows:

Mr Sanny Ikem
Address: Imm. CCIAB 7 etage port 13 -8 09 BP 6357  LOME TOGO

E-mail: infosannyikem@yahoo.com


Please contact the solicitor now and ask to send you a bank ATM CARD
worth total sum of Eighty thousand dollars ($80,000,00) only which I
kept for your compensation for all your past efforts and attempt to
assist me in this  matter. However, I so much appreciate your effort,
love and care to me at that time and I am praying that Almighty God
will always bless you wherever you are in this world.

Therefore, feel free to get in touch with the solicitor and instruct
where to send the ATM CARD to you. Please do let me know immediately
you receive the bank ATM CARD from the solicitor so that we can share
the joy after all the long sufferings and stress that we  went through
during that time. In the moment, I am very busy here because of the
investment projects which I and my new partner are having at hand; and
I will  not have the chance to check and reply my emails at the
moment.Please I will get back to you at my convenient time;
Bye-bye and remain blessed.
In love yours sincerely


Barrister Joseph

^ permalink raw reply

* [PATCH net-next v2 1/2] r8152: call rtl_start_rx after netif_carrier_on
From: Hayes Wang @ 2014-12-22  6:52 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-110-Taiwan-albertk@realtek.com>

Remove rtl_start_rx() from rtl_enable() and put it after calling
netif_carrier_on().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 2d1c77e..cbe450c 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -2043,7 +2043,7 @@ static int rtl_enable(struct r8152 *tp)
 
 	rxdy_gated_en(tp, false);
 
-	return rtl_start_rx(tp);
+	return 0;
 }
 
 static int rtl8152_enable(struct r8152 *tp)
@@ -2858,6 +2858,7 @@ static void set_carrier(struct r8152 *tp)
 			tp->rtl_ops.enable(tp);
 			set_bit(RTL8152_SET_RX_MODE, &tp->flags);
 			netif_carrier_on(netdev);
+			rtl_start_rx(tp);
 		}
 	} else {
 		if (tp->speed & LINK_STATUS) {
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next v2 2/2] r8152: check the status before submitting rx
From: Hayes Wang @ 2014-12-22  6:52 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-110-Taiwan-albertk@realtek.com>

Don't submit the rx if the device is unplugged, stopped, or
linking down.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index cbe450c..8ecc2df 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1789,6 +1789,11 @@ int r8152_submit_rx(struct r8152 *tp, struct rx_agg *agg, gfp_t mem_flags)
 {
 	int ret;
 
+	/* The rx would be stopped, so skip submitting */
+	if (test_bit(RTL8152_UNPLUG, &tp->flags) ||
+	    !test_bit(WORK_ENABLE, &tp->flags) || !netif_carrier_ok(tp->netdev))
+		return 0;
+
 	usb_fill_bulk_urb(agg->urb, tp->udev, usb_rcvbulkpipe(tp->udev, 1),
 			  agg->head, agg_buf_sz,
 			  (usb_complete_t)read_bulk_callback, agg);
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next v2 0/2] r8152: adjust r8152_submit_rx
From: Hayes Wang @ 2014-12-22  6:52 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-107-Taiwan-albertk@realtek.com>

v2:
Replace the patch #1 with "call rtl_start_rx after netif_carrier_on".

For patch #2, replace checking tp->speed with netif_carrier_ok.

v1:
Avoid r8152_submit_rx() from submitting rx during unexpected
moment. This could reduce the time of stopping rx.

For patch #1, the tp->speed should be updated early. Then,
the patch #2 could use it to check the current linking status.

Hayes Wang (2):
  r8152: call rtl_start_rx after netif_carrier_on
  r8152: check the status before submitting rx

 drivers/net/usb/r8152.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.1.0

^ permalink raw reply

* Re: SRIOV as bridge Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Roopa Prabhu @ 2014-12-22  6:24 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: John Fastabend, Hubert Sokolowski, netdev@vger.kernel.org,
	Vlad Yasevich, Shrijeet Mukherjee
In-Reply-To: <54978C3C.4040102@mojatatu.com>

On 12/21/14, 7:13 PM, Jamal Hadi Salim wrote:
> On 12/21/14 15:46, Roopa Prabhu wrote:
>> On 12/21/14, 12:06 PM, Jamal Hadi Salim wrote:
>
>> yes, could be, but its not today ('PF' is physical function and 'VF' is
>> virtual function).
>> If you introduce a master/slave relationship between the PF and VF (ie
>> VF's were assigned PF as the master using 'ip link set dev vf master
>> PF), then yes.
>
>
> When someone says "modprobe igb max_vfs=19" then 19 VFs show up. i.e the
> driver creates them. And then there is assumed direct relationship
> between VF and PF. The PF being the parent. Adding fdbs goes via PF.
>
>>> And if the path is via is the PF - i think that seems like "master"
>>> not self, no?
>>
>> Today ...when you add fdb...path is not via the PF netdev.
>
> For SRIOV it is. Example to add via pf eth10 an
> fdb entry to the igb hardware fdb to point to vf1:
> ip link set eth10 vf 1 mac aa:bb:cc:dd:ee:ff vlan 10
> That last part "vf 1 mac aa:bb:cc:dd:ee:ff vlan 10" is typically
> part of an "fdb add semantic" - but we explicitly call out
> eth10, the parent. The PF has control of the hardware fdb.

Ah......i did not know this syntax with 'ip link set'. thanks for 
pointing out.
I always thought that you can still use 'bridge fdb add' for vfs. 
Curious why its not that way.

>
> It maybe
>> internally done that way in PF/VF driver.
>> so, 'master' does not apply today. But if there were such a relationship
>> between PF/VF, yes, 'master' could be used.
>>
>
> I am refering if were to get rid of using iplink. There has to be 
> something pointed to by vf1 that gets called to add the fdb entry in
> hardware.
ok, i assumed we were only talking about  'bridge fdb add'
>
>> PF does not really need to have a master relationship with the VF. Its
>> better that way. Infact it should be that way even in the case of 'the
>> switch device class model' because that will allow switch ports to be
>> added to a linux bridge (and hence make use of the linux bridge (cumulus
>> model). 'master' will be the 'linux bridge device' in this case).
>>
>
> So what do you do if the user sets either one of master/self and it 
> doesnt make sense?

Am guessing it will continue to do what it does today. If there is no 
master or if there is master and the master does not support the op, it 
will return -EOPNOTSUPP. And, self does not make sense in cases where 
the port driver does not support the op. In which case again you will 
get a -EOPNOTSUPP. Have not thought through all the other cases yet.

Thanks,
Roopa

^ permalink raw reply

* Re: [PATCH net-next 2/2] r8152: check the status before submittingrx
From: David Miller @ 2014-12-22  5:22 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2ED5A35@RTITMBSV03.realtek.com.tw>

From: Hayes Wang <hayeswang@realtek.com>
Date: Mon, 22 Dec 2014 02:53:42 +0000

>  David Miller [mailto:davem@davemloft.net] 
>> Sent: Saturday, December 20, 2014 4:44 AM
> [...]
>> > Don't submit the rx if the device is unplugged, linking down,
>> > or stopped.
>>  ...
>> > @@ -1789,6 +1789,11 @@ int r8152_submit_rx(struct r8152 
>> *tp, struct rx_agg *agg, gfp_t mem_flags)
>> >  {
>> >  	int ret;
>> >  
>> > +	/* The rx would be stopped, so skip submitting */
>> > +	if (test_bit(RTL8152_UNPLUG, &tp->flags) ||
>> > +	    !test_bit(WORK_ENABLE, &tp->flags) || !(tp->speed & LINK_STATUS))
>> > +		return 0;
>> > +
>> 
>> I think netif_carrier_off() should always be true in all three of those
>> situations, and would be a much simpler test than what you've coded
>> here.
> 
> When the device is unplugged or stopped, the linking status
> may be true, so I add additional checks to avoid the submission.
> 
> Besides, in set_carrier() I set netif_carrier_on() after
> ops.enable() to avoid any transmission before I finish
> starting the tx/rx.
> 
> 	tp->rtl_ops.enable(tp);
> 	set_bit(RTL8152_SET_RX_MODE, &tp->flags);
> 	netif_carrier_on(netdev);
> 
> However, the r8152_submit_rx() would be called in ops.enable(),
> and the check of netif_carrier_ok() would be always false. That
> is why I use tp->speed, not netif_carrier_ok(), to check the
> linking stauts.

I stil think your check is way too complicated for this fast path so I
would ask that you arrange things such that the simpler
netif_carrier_off() test works.

Especially because that is what the core networking stack uses
to decide whether to send packets to us as well.

^ permalink raw reply

* Re: [PATCH 01/10] core: Split out UFO6 support
From: Vlad Yasevich @ 2014-12-22  4:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, stefanha, ben,
	David Miller
In-Reply-To: <20141220210359.GA23262@redhat.com>

On 12/20/2014 04:03 PM, Michael S. Tsirkin wrote:
> On Fri, Dec 19, 2014 at 03:13:20PM -0500, Vlad Yasevich wrote:
>> On 12/18/2014 12:50 PM, Michael S. Tsirkin wrote:
>>> On Thu, Dec 18, 2014 at 07:35:46PM +0200, Michael S. Tsirkin wrote:
>>>>>> We should either generate our own ID,
>>>>>> like we always did, or make sure we don't accept
>>>>>> these packets.
>>>>>> Second option is almost sure to break userspace,
>>>>>> so it seems we must do the first one.
>>>>>>
>>>>>
>>>>> Right.  This was missing from packet sockets.  I can fix it.
>>>>>
>>>>> -vlad
>>>>
>>>> Also, this can't be a patch on top, since we don't
>>>> want bisect to give us configurations which
>>>> can BUG().
>>>
>>> So how doing this in stages:
>>>
>>> 1. add helper that checks skb GSO type.
>>> If it is SKB_GSO_UDP, check for IPv6, and
>>> generate the fragment ID.
>>>
>>> Call this helper in
>>> 	- virtio net rx path
>>
>> Why do we need id on rx path?  Fragment ID should only be generated on tx.
> 
> So that all GSO_UDP6 packets have fragment ID as appropriate.
> It's similar to how we fill it on rx in tun, is it not?

Saying "rx in tun" always hurts my head.  We fill it in in tun_get_user()
which then passed the packet to the kernel for forwarding.  The is later
used in the forwarding process.

I've been thinking about putting this fragment generation into the GSO
path so there is only 1 spot that ever needs to this.   This way it
would only be done if the fragment id is actually needed.  For most
guest-to-guest comms, the id isn't needed.

> 
> 
>>> 	- tun rx path (get user)
>>> 	- macvtap tx patch (get user)
>>> 	- packet socket tx patch (get user)
>>
>> The reset makes sense.
>>>
>>> 2. Put back UFO flag everywhere: virtio,tun,macvtap,packet,
>>> since we can handle IPv6 now even if it's suboptimal.
>>>
>>> Above two seem like smalling patches, appropriate for stable.
>>
>> OK.
>>
>>>
>>> Next, work on a new feature to pass
>>> fragment ID in virtio header:
>>>
>>> 3. split UFO/UFO6 as you did here, but add code
>>> in above 4 places to convert UDP to UDP6,
>>
>> Doing so will adversely impact IPv6 UFO performance for legacy
>> guests.  I know how many times I've seen mail wrt patch xyz caused
>> %X  regression in my setup and how we've reverted or tried to fixed
>> things to solve this.  If we go with approach, the only "fix' would be
>> to upgrade the guest and that's not available to some users.
>>
>> -vlad
> 
> I think there's some misunderstanding here.
> 
> I merely mean that after split, host should always have
> SKB_GSO_UDP6 set for IPv6.
> 
> To make sure legacy userspace/guests don't notice changes,
> whenever we detect SKB_GSO_UDP6 we should set VIRTIO_NET_HDR_GSO_UDP,
> and whenever we get VIRTIO_NET_HDR_GSO_UDP we should set either
> SKB_GSO_UDP6 or SKB_GSO_UDP depending on IP type.

This is the part that introduced the regression.  By setting the gso_type
to UDP6, we trigger skb_gso_segment() to actually perform IPv6 fragmentation.

I've seen this when passing UDP traffic from 2 fedora19 systems over the
kernel that does the above.

-vlad

> 
> Given this clarification there's no reason to think
> old guests will regress, correct?
> 
>>> additionally, add code in
>>> 	- virtio net tx path
>>> 	- tun tx path (get user)
>>> 	- macvtap rx patch (put user)
>>> 	- packet socket rx patch (put user)
>>> to convert UDP6 to UDP.
>>>
>>> 	step 3 needs to be bisect-clean, somehow.
>>>
>>> 4. add new field in header, new feature bit for virtio net to gate it,
>>> new ioctls to tun,macvtap,packet socket.
>>>
>>> These two are more like optimizations, so not stable material.
>>>
>>>

^ permalink raw reply

* Re: SRIOV as bridge Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Jamal Hadi Salim @ 2014-12-22  3:13 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: John Fastabend, Hubert Sokolowski, netdev@vger.kernel.org,
	Vlad Yasevich, Shrijeet Mukherjee
In-Reply-To: <549731A1.3090307@cumulusnetworks.com>

On 12/21/14 15:46, Roopa Prabhu wrote:
> On 12/21/14, 12:06 PM, Jamal Hadi Salim wrote:

> yes, could be, but its not today ('PF' is physical function and 'VF' is
> virtual function).
> If you introduce a master/slave relationship between the PF and VF (ie
> VF's were assigned PF as the master using 'ip link set dev vf master
> PF), then yes.

When someone says "modprobe igb max_vfs=19" then 19 VFs show up. i.e the
driver creates them. And then there is assumed direct relationship
between VF and PF. The PF being the parent. Adding fdbs goes via PF.

>> And if the path is via is the PF - i think that seems like "master"
>> not self, no?
>
> Today ...when you add fdb...path is not via the PF netdev.

For SRIOV it is. Example to add via pf eth10 an
fdb entry to the igb hardware fdb to point to vf1:
ip link set eth10 vf 1 mac aa:bb:cc:dd:ee:ff vlan 10
That last part "vf 1 mac aa:bb:cc:dd:ee:ff vlan 10" is typically
part of an "fdb add semantic" - but we explicitly call out
eth10, the parent. The PF has control of the hardware fdb.

It maybe
> internally done that way in PF/VF driver.
> so, 'master' does not apply today. But if there were such a relationship
> between PF/VF, yes, 'master' could be used.
>

I am refering if were to get rid of using iplink. There has to be 
something pointed to by vf1 that gets called to add the fdb entry in
hardware.

> PF does not really need to have a master relationship with the VF. Its
> better that way. Infact it should be that way even in the case of 'the
> switch device class model' because that will allow switch ports to be
> added to a linux bridge (and hence make use of the linux bridge (cumulus
> model). 'master' will be the 'linux bridge device' in this case).
>

So what do you do if the user sets either one of master/self and it 
doesnt make sense?

cheers,
jamal

> Thanks,
> Roopa

^ permalink raw reply

* Re: SRIOV as bridge Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Jamal Hadi Salim @ 2014-12-22  2:59 UTC (permalink / raw)
  To: John Fastabend, Roopa Prabhu
  Cc: Hubert Sokolowski, netdev@vger.kernel.org, Vlad Yasevich,
	Shrijeet Mukherjee
In-Reply-To: <5497250F.2020906@gmail.com>

On 12/21/14 14:52, John Fastabend wrote:

> not a problem thanks for the response. I might try to document this
> somewhere if folks think it would be useful. Something describing
> how it works today would be helpful is my thought. Showing the
> various stacked cases and how messages get propagated. (Some cases
> being with bridge, without bridge, with bridge and multiple uplinks,
> with bridge + VLAN filtering, macvlan, SR-IOV + bridge + VMDQ, etc.)
>
> Its not a small task so likely won't get to it until after the new
> year.
>

I think this would be very useful.
We are looking for an L2 tutorial at netdev01 (I was looking at
Vlad and Roopa so far) so maybe we could split the work. Would
you be interested?
Yes, all those bridge and bridge like things like Vxlan for example
should be part of this de-mystification.
Shrijeet is also putting together a BOF for the offload stuff.

>
> Yes, but I don't think its too late to bring it into the picture here.
>

that would be nice.

>>> The PF port, when acting as the control interface, is actually
>>> TheClassThingy we discuss on/off.
>
> Yep or if you take Jiri's approach any port on the nic could be used
> to manage this.
>
> The VF's may have netdev's if they are in the host. In this case you
> could use 'bridge fdb' to manage them. In many use cases though the
> VFs are directly assigned to VMs and then are outside the hosts
> management domain. For this case you can either let the host tell the
> driver which addresses it would want to receive.
>

Which driver? The PF? I dont think the semantics allow for that;
How do i tell the PF using "fdb add" that a specific mac+vlan should
be sent to vf1 which is now sitting inside some vm?

> Another _idea_ would be to create a "shadow" netdev in the host
> to manage the port even when the VF is direct assigned.

That may work. But there are questions:
Can its name be changed within the container/vm or are those
different namespaces etc.
So if the answer is that "self" implies using the hardware fdb,
what does "master" mean?

>Then you
> could use all your normal commands from the host to set the MTU,
> set any MACs, etc. At the moment as Jamal noted we have a subset
> of 'ip link' commands that we use to work on VFs when they are not
> in the host domain.
>
>      'ip link set ethx vf # ...'
>

I think once it moves management of such things as MTUs should be from
wherever that thing sits at (vm/container) to avoid any suprises.

> In the SR-IOV case you would have a PF and then a set of eth-vf#
> netdev's which are not attached to a VF but act as the management
> interface for the port.
>

you can have more than one PF?

> I think this is not specific to SR-IOV though right.This is the
> same point for both "real" switch ASICs and SR-IOV. Using the netdev
> directly as a management interace (a la rocker) seems to work OK.
> But does it become cleaner to have the switch object represented
> explicitly for management.
>

Indeed it does. In particular if you had to move around a bridge port
to a container or VM.
Now if we introduced the idea of showing up with netdev for each vf
and you had a classthing you dont have to keep the vf1s visible
once migrated - but would be able to add fdb entries pointing to
them (assuming names/ifindices remain valid).

cheers,
jamal

^ permalink raw reply

* RE: [PATCH net-next 2/2] r8152: check the status before submittingrx
From: Hayes Wang @ 2014-12-22  2:53 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, nic_swsd, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org
In-Reply-To: <20141219.154402.54691026682141762.davem@davemloft.net>

 David Miller [mailto:davem@davemloft.net] 
> Sent: Saturday, December 20, 2014 4:44 AM
[...]
> > Don't submit the rx if the device is unplugged, linking down,
> > or stopped.
>  ...
> > @@ -1789,6 +1789,11 @@ int r8152_submit_rx(struct r8152 
> *tp, struct rx_agg *agg, gfp_t mem_flags)
> >  {
> >  	int ret;
> >  
> > +	/* The rx would be stopped, so skip submitting */
> > +	if (test_bit(RTL8152_UNPLUG, &tp->flags) ||
> > +	    !test_bit(WORK_ENABLE, &tp->flags) || !(tp->speed & LINK_STATUS))
> > +		return 0;
> > +
> 
> I think netif_carrier_off() should always be true in all three of those
> situations, and would be a much simpler test than what you've coded
> here.

When the device is unplugged or stopped, the linking status
may be true, so I add additional checks to avoid the submission.

Besides, in set_carrier() I set netif_carrier_on() after
ops.enable() to avoid any transmission before I finish
starting the tx/rx.

	tp->rtl_ops.enable(tp);
	set_bit(RTL8152_SET_RX_MODE, &tp->flags);
	netif_carrier_on(netdev);

However, the r8152_submit_rx() would be called in ops.enable(),
and the check of netif_carrier_ok() would be always false. That
is why I use tp->speed, not netif_carrier_ok(), to check the
linking stauts.
 
Best Regards,
Hayes

^ permalink raw reply

* Re: [PATCH 2/2] MIPS: Hibernate: Restructure files and functions
From: Huacai Chen @ 2014-12-22  2:32 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: Srinivas Kandagatla, David S. Miller, netdev, Huacai Chen
In-Reply-To: <1419215296-27831-2-git-send-email-chenhc@lemote.com>

Sorry, send to a wrong place...

On Mon, Dec 22, 2014 at 10:28 AM, Huacai Chen <chenhc@lemote.com> wrote:
> This patch has no functional changes, it just to keep the assembler
> code to a minimum. Files and functions naming is borrowed from X86.
>
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/power/Makefile                         |    2 +-
>  arch/mips/power/hibernate.c                      |   10 ++++++++++
>  arch/mips/power/{hibernate.S => hibernate_asm.S} |    6 ++----
>  3 files changed, 13 insertions(+), 5 deletions(-)
>  create mode 100644 arch/mips/power/hibernate.c
>  rename arch/mips/power/{hibernate.S => hibernate_asm.S} (90%)
>
> diff --git a/arch/mips/power/Makefile b/arch/mips/power/Makefile
> index 73d56b8..70bd788 100644
> --- a/arch/mips/power/Makefile
> +++ b/arch/mips/power/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_HIBERNATION) += cpu.o hibernate.o
> +obj-$(CONFIG_HIBERNATION) += cpu.o hibernate.o hibernate_asm.o
> diff --git a/arch/mips/power/hibernate.c b/arch/mips/power/hibernate.c
> new file mode 100644
> index 0000000..19a9af6
> --- /dev/null
> +++ b/arch/mips/power/hibernate.c
> @@ -0,0 +1,10 @@
> +#include <asm/tlbflush.h>
> +
> +extern int restore_image(void);
> +
> +int swsusp_arch_resume(void)
> +{
> +       /* Avoid TLB mismatch during and after kernel resume */
> +       local_flush_tlb_all();
> +       return restore_image();
> +}
> diff --git a/arch/mips/power/hibernate.S b/arch/mips/power/hibernate_asm.S
> similarity index 90%
> rename from arch/mips/power/hibernate.S
> rename to arch/mips/power/hibernate_asm.S
> index e7567c8..b1fab95 100644
> --- a/arch/mips/power/hibernate.S
> +++ b/arch/mips/power/hibernate_asm.S
> @@ -29,9 +29,7 @@ LEAF(swsusp_arch_suspend)
>         j swsusp_save
>  END(swsusp_arch_suspend)
>
> -LEAF(swsusp_arch_resume)
> -       /* Avoid TLB mismatch during and after kernel resume */
> -       jal local_flush_tlb_all
> +LEAF(restore_image)
>         PTR_L t0, restore_pblist
>  0:
>         PTR_L t1, PBE_ADDRESS(t0)   /* source */
> @@ -60,4 +58,4 @@ LEAF(swsusp_arch_resume)
>         PTR_L s7, PT_R23(t0)
>         PTR_LI v0, 0x0
>         jr ra
> -END(swsusp_arch_resume)
> +END(restore_image)
> --
> 1.7.7.3
>

^ permalink raw reply

* Re: [PATCH 1/2] MIPS: Hibernate: flush TLB entries earlier
From: Huacai Chen @ 2014-12-22  2:31 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: Srinivas Kandagatla, David S. Miller, netdev, Huacai Chen, stable
In-Reply-To: <1419215296-27831-1-git-send-email-chenhc@lemote.com>

Sorry, send to a wrong place...

On Mon, Dec 22, 2014 at 10:28 AM, Huacai Chen <chenhc@lemote.com> wrote:
> We found that TLB mismatch not only happens after kernel resume, but
> also happens during snapshot restore. So move it to the beginning of
> swsusp_arch_suspend().
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/power/hibernate.S |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/mips/power/hibernate.S b/arch/mips/power/hibernate.S
> index 32a7c82..e7567c8 100644
> --- a/arch/mips/power/hibernate.S
> +++ b/arch/mips/power/hibernate.S
> @@ -30,6 +30,8 @@ LEAF(swsusp_arch_suspend)
>  END(swsusp_arch_suspend)
>
>  LEAF(swsusp_arch_resume)
> +       /* Avoid TLB mismatch during and after kernel resume */
> +       jal local_flush_tlb_all
>         PTR_L t0, restore_pblist
>  0:
>         PTR_L t1, PBE_ADDRESS(t0)   /* source */
> @@ -43,7 +45,6 @@ LEAF(swsusp_arch_resume)
>         bne t1, t3, 1b
>         PTR_L t0, PBE_NEXT(t0)
>         bnez t0, 0b
> -       jal local_flush_tlb_all /* Avoid TLB mismatch after kernel resume */
>         PTR_LA t0, saved_regs
>         PTR_L ra, PT_R31(t0)
>         PTR_L sp, PT_R29(t0)
> --
> 1.7.7.3
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox