Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] bonding: fix arp_validate on bonds inside a bridge
From: David Miller @ 2010-05-04 23:18 UTC (permalink / raw)
  To: fubar; +Cc: jbohac, bonding-devel, netdev
In-Reply-To: <17907.1272935182@death.nxdomain.ibm.com>

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Mon, 03 May 2010 18:06:22 -0700

> 	Tested and it looks to work as advertised.  I see only one minor
> nit, there's a pr_debug that missed being renamed to the new function
> name; here's the whole patch with that fixed.

I don't think you need the ugly arp hook.

Instead, it's much cleaner to provide a way for packet type taps to
see the packet before bridge et al. decapsulation.  In fact this makes
a lot of sense, wanting to see the device as __netif_receive_skb() saw
it, with no changes whatsoever.

In fact ptype_all runs before bridging, ING, and MACVLAN decap the
thing, so we could have a 'ptype_base_predecap[]' that we run over
right after those.

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04 23:07 UTC (permalink / raw)
  To: enh; +Cc: brian.haley, dlstevens, netdev, netdev-owner
In-Reply-To: <AANLkTilPHlQ-CDSWnUn52al5DJfRxuZ8eiXwf3ep5LUD@mail.gmail.com>

From: enh <enh@google.com>
Date: Tue, 4 May 2010 15:26:52 -0700

> On Tue, May 4, 2010 at 14:46, David Miller <davem@davemloft.net> wrote:
>> From: Brian Haley <brian.haley@hp.com>
>> Date: Tue, 04 May 2010 10:40:58 -0400
>>
>>> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
>>> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
>>>
>>> Signed-off-by: Brian Haley <brian.haley@hp.com>
>>
>> In cast it wasn't clear from my other reply, I'm not applying this
>> patch because I intentionally left this behavior there based upon
>> some comments from Elliot in that this lets developers get the
>> old default by asking for "-1" explicitly with a setsockopt.
> 
> (for the record, i don't need that behavior myself, and have no
> opinion on whether or not it makes sense for you guys... i'll only
> ever call setsockopt with 0 <= value <= 255. all i need is for the
> default when i never call setsockopt to be 1. for now, i've added a
> work-around where i explicitly call setsockopt with 1 when i create
> the socket.)

It's more of an issue of having at least some kind of compatability
story when we change this.

With what's in the tree now we can at least say "if you explicitly
setsockopt() the value to '-1' you will get the same behavior now
as beforehand"

Whereas with what others are suggesting, we can't give people a way
in their applications to do that other than to suggest they use
disgusting concoctions like "set non-multicast hoplimit to '-1',
getsockopt() that, then set the multicast hop explicitly to that"

And even that won't work the same as now, in that changes to the
per-route metric will be ignored.

^ permalink raw reply

* RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Pankaj Thakkar @ 2010-05-04 23:02 UTC (permalink / raw)
  To: linux-kernel, netdev, virtualization; +Cc: pv-drivers, sbhatewara

Device passthrough technology allows a guest to bypass the hypervisor and drive
the underlying physical device. VMware has been exploring various ways to
deliver this technology to users in a manner which is easy to adopt. In this
process we have prepared an architecture along with Intel - NPA (Network Plugin
Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
passthrough to a number of physical NICs which support it. The document below
provides an overview of NPA.

We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
Linux users can exploit the benefits provided by passthrough devices in a
seamless manner while retaining the benefits of virtualization. The document
below tries to answer most of the questions which we anticipated. Please let us
know your comments and queries.

Thank you.

Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>

Network Plugin Architecture
---------------------------

VMware has been working on various device passthrough technologies for the past
few years. Passthrough technology is interesting as it can result in better
performance/cpu utilization for certain demanding applications. In our vSphere
product we support direct assignment of PCI devices like networking adapters to
a guest virtual machine. This allows the guest to drive the device using the
device drivers installed inside the guest. This is similar to the way KVM
allows for passthrough of PCI devices to the guests. The hypervisor is bypassed
for all I/O and control operations and hence it can not provide any value add
features such as live migration, suspend/resume, etc.

Network Plugin Architecture (NPA) is an approach which VMware has developed in
joint partnership with Intel which allows us to retain the best of passthrough
technology and virtualization. NPA allows for passthrough of the fast data
(I/O) path and lets the hypervisor deal with the slow control path using
traditional emulation/paravirtualization techniques. Through this splitting of
data and control path the hypervisor can still provide the above mentioned
value add features and exploit the performance benefits of passthrough.

NPA requires SR-IOV hardware which allows for sharing of one single NIC adapter
by multiple guests. SR-IOV hardware has many logically separate functions
called virtual functions (VF) which can be independently assigned to the guest
OS. They also have one or more physical functions (PF) (managed by a PF driver)
which are used by the hypervisor to control certain aspects of the VFs and the
rest of the hardware. NPA splits the guest driver into two components called
the Shell and the Plugin. The shell is responsible for interacting with the
guest networking stack and funneling the control operations to the hypervisor.
The plugin is responsible for driving the data path of the virtual function
exposed to the guest and is specific to the NIC hardware. NPA also requires an
embedded switch in the NIC to allow for switching traffic among the virtual
functions. The PF is also used as an uplink to provide connectivity to other
VMs which are in emulation mode. The figure below shows the major components in
a block diagram.

        +------------------------------+
        |         Guest VM             |
        |                              |
        |      +----------------+      |
        |      | vmxnet3 driver |      |
        |      |     Shell      |      |
        |      | +============+ |      |
        |      | |   Plugin   | |      |
        +------+-+------------+-+------+
                |           .
               +---------+  .
               | vmxnet3 |  .
               |___+-----+  .
                     |      .
                     |      .
                +----------------------------+
                |                            |
                |       virtual switch       |
                +----------------------------+
                  |         .               \
                  |         .                \
           +=============+  .                 \
           | PF control  |  .                  \
           |             |  .                   \
           |  L2 driver  |  .                    \
           +-------------+  .                     \
                  |         .                      \
                  |         .                       \
                +------------------------+     +------------+
                | PF   VF1 VF2 ...   VFn |     |            |
                |                        |     |  regular   |
                |       SR-IOV NIC       |     |    nic     |
                |    +--------------+    |     |   +--------+
                |    |   embedded   |    |     +---+
                |    |    switch    |    |
                |    +--------------+    |
                |        +---------------+
                +--------+

NPA offers several benefits:
1. Performance: Critical performance sensitive paths are not trapped and the
guest can directly drive the hardware without incurring virtualization
overheads.

2. Hypervisor control: All control operations from the guest such as programming
MAC address go through the hypervisor layer and hence can be subjected to
hypervisor policies. The PF driver can be further used to put policy decisions
like which VLAN the guest should be on.

3. Guest Management: No hardware specific drivers need to be installed in the
guest virtual machine and hence no overheads are incurred for guest management.
All software for the driver (including the PF driver and the plugin) is
installed in the hypervisor.

4. IHV independence: The architecture provides guidelines for splitting the
functionality between the VFs and PF but does not dictate how the hardware
should be implemented. It gives the IHV the freedom to do asynchronous updates
either to the software or the hardware to work around any defects.

The fundamental tenet in NPA is to let the hypervisor control the passthrough
functionality with minimal guest intervention. This gives a lot of flexibility
to the hypervisor which can then treat passthrough as an offload feature (just
like TSO, LRO, etc) which is offered to the guest virtual machine when there
are no conflicting features present. For example, if the hypervisor wants to
migrate the virtual machine from one host to another, the hypervisor can switch
the virtual machine out of passthrough mode into paravirtualized/emulated mode
and it can use existing technique to migrate the virtual machine. Once the
virtual machine is migrated to the destination host the hypervisor can switch
the virtual machine back to passthrough mode if a supporting SR-IOV nic is
present. This may involve reloading of a different plugin corresponding to the
new SR-IOV hardware.

Internally we have explored various other options before settling on the NPA
approach. For example there are approaches which create a bonding driver on top
of a complete passthrough of a NIC device and an emulated/paravirtualized
device. Though this approach allows for live migration to work it adds a lot of
complexity and dependency. First the hypervisor has to rely on a guest with
hot-add support. Second the hypervisor has to depend on the guest networking
stack to cooperate to perform migration. Third the guest has to carry the
driver images for all possible hardware to which the guest may migrate to.
Fourth the hypervisor does not get full control for all the policy decisions.
Another approach we have considered is to have a uniform interface for the data
path between the emulated/paravirtualized device and the hardware device which
allows the hypervisor to seamlessly switch from the emulated interface to the
hardware interface. Though this approach is very attractive and can work
without any guest involvement it is not acceptable to the IHVs as it does not
give them the freedom to fix bugs/erratas and differentiate from each other. We
believe NPA approach provides the right level of control and flexibility to the
hypervisors while letting the guest exploit the benefits of passthrough.

The plugin image is provided by the IHVs along with the PF driver and is
packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
either into a Linux VM or a Windows VM. The plugin is written against the Shell
API interface which the shell is responsible for implementing. The API
interface allows the plugin to do TX and RX only by programming the hardware
rings (along with things like buffer allocation and basic initialization). The
virtual machine comes up in paravirtualized/emulated mode when it is booted.
The hypervisor allocates the VF and other resources and notifies the shell of
the availability of the VF. The hypervisor injects the plugin into memory
location specified by the shell. The shell initializes the plugin by calling
into a known entry point and the plugin initializes the data path. The control
path is already initialized by the PF driver when the VF is allocated. At this
point the shell switches to using the loaded plugin to do all further TX and RX
operations. The guest networking stack does not participate in these operations
and continues to function normally. All the control operations continue being
trapped by the hypervisor and are directed to the PF driver as needed. For
example, if the MAC address changes the hypervisor updates its internal state
and changes the state of the embedded switch as well through the PF control
API.

We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
splitting the driver into two parts: Shell and Plugin. The new split driver is
backwards compatible and continues to work on old/existing vmxnet3 device
emulations. The shell implements the API interface and contains code to do the
bookkeeping for TX/RX buffers along with interrupt management. The shell code
also handles the loading of the plugin and verifying the license of the loaded
plugin. The plugin contains the code specific to vmxnet3 ring and descriptor
management. The plugin uses the same Shell API interface which would be used by
other IHVs. This vmxnet3 plugin is compiled statically along with the shell as
this is needed to provide connectivity when there is no underlying SR-IOV
device present. The IHV plugins are required to be distributed under GPL
license and we are currently looking at ways to verify this both within the
hypervisor and within the shell.

^ permalink raw reply

* [RFC] net: change bridge/macvlan hook to be be generic
From: Stephen Hemminger @ 2010-05-04 22:37 UTC (permalink / raw)
  To: David Miller, Patrick McHardy; +Cc: netdev

The existing macvlan and bridge have special hooks in the packet input
path. This patch changes it to a generic hook chain, like the packet type
processing. I have been wanting to look into flow based switching, etc...

Pro: generic code rather than special purpose
     safer against race during insertion/removal
     easier for out of tree network code

Con: performance (loop vs unrolled) and calling module for each packet
     easier for out of tree network code

This is prototype only, don't use it as is...



---
 drivers/net/macvlan.c     |   24 ++++++----
 include/linux/netdevice.h |   19 ++++++++
 net/bridge/br.c           |    4 -
 net/bridge/br_fdb.c       |    5 ++
 net/bridge/br_input.c     |   25 ++++++++--
 net/bridge/br_private.h   |    3 -
 net/core/dev.c            |  107 +++++++++++++++++-----------------------------
 7 files changed, 103 insertions(+), 84 deletions(-)

--- a/drivers/net/macvlan.c	2010-05-04 15:15:54.532454436 -0700
+++ b/drivers/net/macvlan.c	2010-05-04 15:30:19.461536637 -0700
@@ -145,21 +145,24 @@ static void macvlan_broadcast(struct sk_
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct net_device *orig_dev,
+					    struct sk_buff *skb)
 {
-	const struct ethhdr *eth = eth_hdr(skb);
+	const struct ethhdr *eth;
 	const struct macvlan_port *port;
 	const struct macvlan_dev *vlan;
-	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
-	port = rcu_dereference(skb->dev->macvlan_port);
+	port = rcu_dereference(orig_dev->macvlan_port);
 	if (port == NULL)
 		return skb;
 
+	eth = eth_hdr(skb);
 	if (is_multicast_ether_addr(eth->h_dest)) {
-		src = macvlan_hash_lookup(port, eth->h_source);
+		const struct macvlan_dev *src
+			= macvlan_hash_lookup(port, eth->h_source);
+
 		if (!src)
 			/* frame comes from an external address */
 			macvlan_broadcast(skb, port, NULL,
@@ -759,19 +762,24 @@ static struct notifier_block macvlan_not
 	.notifier_call	= macvlan_device_event,
 };
 
+static struct packet_handler macvlan_hook = {
+	.priority = MACVLAN_HANDLER,
+	.func     = macvlan_handle_frame,
+};
+
 static int __init macvlan_init_module(void)
 {
 	int err;
 
 	register_netdevice_notifier(&macvlan_notifier_block);
-	macvlan_handle_frame_hook = macvlan_handle_frame;
 
 	err = macvlan_link_register(&macvlan_link_ops);
 	if (err < 0)
 		goto err1;
+
+	dev_add_handler(&macvlan_hook);
 	return 0;
 err1:
-	macvlan_handle_frame_hook = NULL;
 	unregister_netdevice_notifier(&macvlan_notifier_block);
 	return err;
 }
@@ -779,7 +787,7 @@ err1:
 static void __exit macvlan_cleanup_module(void)
 {
 	rtnl_link_unregister(&macvlan_link_ops);
-	macvlan_handle_frame_hook = NULL;
+	dev_remove_handler(&macvlan_hook);
 	unregister_netdevice_notifier(&macvlan_notifier_block);
 }
 
--- a/include/linux/netdevice.h	2010-05-04 15:15:54.592523341 -0700
+++ b/include/linux/netdevice.h	2010-05-04 15:26:14.741067048 -0700
@@ -1011,10 +1011,15 @@ struct net_device {
 	/* mid-layer private */
 	void			*ml_priv;
 
+
+#if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE)
 	/* bridge stuff */
 	struct net_bridge_port	*br_port;
+#endif
+#if defined(CONFIG_MACVLAN) || defined(CONFIG_MACVLAN_MODULE)
 	/* macvlan */
 	struct macvlan_port	*macvlan_port;
+#endif
 	/* GARP */
 	struct garp_port	*garp_port;
 
@@ -1204,6 +1209,17 @@ struct packet_type {
 	struct list_head	list;
 };
 
+enum handler_priority {
+	BRIDGE_HANDLER    = 1,
+	MACVLAN_HANDLER,
+};
+
+struct packet_handler {
+	struct list_head       list;
+	enum handler_priority  priority;
+	struct sk_buff *       (*func)(struct net_device *, struct sk_buff *);
+};
+
 #include <linux/interrupt.h>
 #include <linux/notifier.h>
 
@@ -1259,6 +1275,9 @@ extern void		dev_add_pack(struct packet_
 extern void		dev_remove_pack(struct packet_type *pt);
 extern void		__dev_remove_pack(struct packet_type *pt);
 
+extern void		dev_add_handler(struct packet_handler *h);
+extern void		dev_remove_handler(struct packet_handler *h);
+
 extern struct net_device	*dev_get_by_flags(struct net *net, unsigned short flags,
 						  unsigned short mask);
 extern struct net_device	*dev_get_by_name(struct net *net, const char *name);
--- a/net/bridge/br.c	2010-05-04 15:15:54.542453499 -0700
+++ b/net/bridge/br.c	2010-05-04 15:16:10.111257969 -0700
@@ -63,7 +63,7 @@ static int __init br_init(void)
 		goto err_out4;
 
 	brioctl_set(br_ioctl_deviceless_stub);
-	br_handle_frame_hook = br_handle_frame;
+	dev_add_handler(&br_packet_hook);
 
 #if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
 	br_fdb_test_addr_hook = br_fdb_test_addr;
@@ -100,7 +100,7 @@ static void __exit br_deinit(void)
 	br_fdb_test_addr_hook = NULL;
 #endif
 
-	br_handle_frame_hook = NULL;
+	dev_remove_handler(&br_packet_hook);
 	br_fdb_fini();
 }
 
--- a/net/bridge/br_fdb.c	2010-05-04 15:15:54.552508079 -0700
+++ b/net/bridge/br_fdb.c	2010-05-04 15:16:23.020967283 -0700
@@ -253,6 +253,11 @@ int br_fdb_test_addr(struct net_device *
 
 	return ret;
 }
+
+/* This hook is defined here for ATM LANE */
+int (*br_fdb_test_addr_hook)(struct net_device *dev,
+			     unsigned char *addr) __read_mostly;
+EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
 #endif /* CONFIG_ATM_LANE */
 
 /*
--- a/net/bridge/br_input.c	2010-05-04 15:15:54.562453677 -0700
+++ b/net/bridge/br_input.c	2010-05-04 15:21:35.521115299 -0700
@@ -133,13 +133,19 @@ static inline int is_link_local(const un
 /*
  * Called via br_handle_frame_hook.
  * Return NULL if skb is handled
- * note: already called with rcu_read_lock (preempt_disabled)
+ * note: already called with rcu_read_lock
  */
-struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb)
+static struct sk_buff *br_handle_frame(struct net_device *dev,
+				       struct sk_buff *skb)
 {
-	const unsigned char *dest = eth_hdr(skb)->h_dest;
+	const unsigned char *dest;
+	struct net_bridge_port *port;
 	int (*rhook)(struct sk_buff *skb);
 
+	if (skb->pkt_type == PACKET_LOOPBACK ||
+	    (port = rcu_dereference(skb->dev->br_port)) == NULL)
+		return skb;
+
 	if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
 		goto drop;
 
@@ -147,13 +153,14 @@ struct sk_buff *br_handle_frame(struct n
 	if (!skb)
 		return NULL;
 
+	dest = eth_hdr(skb)->h_dest;
 	if (unlikely(is_link_local(dest))) {
 		/* Pause frames shouldn't be passed up by driver anyway */
 		if (skb->protocol == htons(ETH_P_PAUSE))
 			goto drop;
 
 		/* If STP is turned off, then forward */
-		if (p->br->stp_enabled == BR_NO_STP && dest[5] == 0)
+		if (port->br->stp_enabled == BR_NO_STP && dest[5] == 0)
 			goto forward;
 
 		if (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
@@ -164,7 +171,7 @@ struct sk_buff *br_handle_frame(struct n
 	}
 
 forward:
-	switch (p->state) {
+	switch (port->state) {
 	case BR_STATE_FORWARDING:
 		rhook = rcu_dereference(br_should_route_hook);
 		if (rhook != NULL) {
@@ -174,7 +181,7 @@ forward:
 		}
 		/* fall through */
 	case BR_STATE_LEARNING:
-		if (!compare_ether_addr(p->br->dev->dev_addr, dest))
+		if (!compare_ether_addr(port->br->dev->dev_addr, dest))
 			skb->pkt_type = PACKET_HOST;
 
 		NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
@@ -186,3 +193,9 @@ drop:
 	}
 	return NULL;
 }
+
+struct packet_handler br_packet_hook = {
+	.priority = BRIDGE_HANDLER,
+	.func     = br_handle_frame,
+};
+
--- a/net/bridge/br_private.h	2010-05-04 15:15:54.542453499 -0700
+++ b/net/bridge/br_private.h	2010-05-04 15:16:10.111257969 -0700
@@ -300,8 +300,7 @@ extern void br_features_recompute(struct
 
 /* br_input.c */
 extern int br_handle_frame_finish(struct sk_buff *skb);
-extern struct sk_buff *br_handle_frame(struct net_bridge_port *p,
-				       struct sk_buff *skb);
+extern struct packet_handler br_packet_hook;
 
 /* br_ioctl.c */
 extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
--- a/net/core/dev.c	2010-05-04 15:15:54.572453666 -0700
+++ b/net/core/dev.c	2010-05-04 15:23:03.945389538 -0700
@@ -175,6 +175,9 @@ static DEFINE_SPINLOCK(ptype_lock);
 static struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 static struct list_head ptype_all __read_mostly;	/* Taps */
 
+static DEFINE_SPINLOCK(phook_lock);
+static struct list_head packet_hook __read_mostly;
+
 /*
  * The @dev_base_head list is protected by @dev_base_lock and the rtnl
  * semaphore.
@@ -459,6 +462,33 @@ void dev_remove_pack(struct packet_type 
 }
 EXPORT_SYMBOL(dev_remove_pack);
 
+void dev_add_handler(struct packet_handler *nh)
+{
+	struct packet_handler *h;
+	struct list_head *slot = &packet_hook;
+
+	spin_lock_bh(&phook_lock);
+	list_for_each_entry(h, &packet_hook, list) {
+		if (h->priority > nh->priority)
+			break;
+		slot = &h->list;
+	}
+	list_add_rcu(&nh->list, slot);
+	spin_unlock_bh(&phook_lock);
+
+}
+EXPORT_SYMBOL_GPL(dev_add_handler);
+
+
+void dev_remove_handler(struct packet_handler *h)
+{
+	spin_lock_bh(&phook_lock);
+	list_del_rcu(&h->list);
+	spin_unlock_bh(&phook_lock);
+	synchronize_net();
+}
+EXPORT_SYMBOL_GPL(dev_remove_handler);
+
 /******************************************************************************
 
 		      Device Boot-time Settings Routines
@@ -2566,66 +2596,6 @@ static inline int deliver_skb(struct sk_
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
 
-#if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE)
-
-#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
-/* This hook is defined here for ATM LANE */
-int (*br_fdb_test_addr_hook)(struct net_device *dev,
-			     unsigned char *addr) __read_mostly;
-EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
-#endif
-
-/*
- * If bridge module is loaded call bridging hook.
- *  returns NULL if packet was consumed.
- */
-struct sk_buff *(*br_handle_frame_hook)(struct net_bridge_port *p,
-					struct sk_buff *skb) __read_mostly;
-EXPORT_SYMBOL_GPL(br_handle_frame_hook);
-
-static inline struct sk_buff *handle_bridge(struct sk_buff *skb,
-					    struct packet_type **pt_prev, int *ret,
-					    struct net_device *orig_dev)
-{
-	struct net_bridge_port *port;
-
-	if (skb->pkt_type == PACKET_LOOPBACK ||
-	    (port = rcu_dereference(skb->dev->br_port)) == NULL)
-		return skb;
-
-	if (*pt_prev) {
-		*ret = deliver_skb(skb, *pt_prev, orig_dev);
-		*pt_prev = NULL;
-	}
-
-	return br_handle_frame_hook(port, skb);
-}
-#else
-#define handle_bridge(skb, pt_prev, ret, orig_dev)	(skb)
-#endif
-
-#if defined(CONFIG_MACVLAN) || defined(CONFIG_MACVLAN_MODULE)
-struct sk_buff *(*macvlan_handle_frame_hook)(struct sk_buff *skb) __read_mostly;
-EXPORT_SYMBOL_GPL(macvlan_handle_frame_hook);
-
-static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
-					     struct packet_type **pt_prev,
-					     int *ret,
-					     struct net_device *orig_dev)
-{
-	if (skb->dev->macvlan_port == NULL)
-		return skb;
-
-	if (*pt_prev) {
-		*ret = deliver_skb(skb, *pt_prev, orig_dev);
-		*pt_prev = NULL;
-	}
-	return macvlan_handle_frame_hook(skb);
-}
-#else
-#define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
-#endif
-
 #ifdef CONFIG_NET_CLS_ACT
 /* TODO: Maybe we should just force sch_ingress to be compiled in
  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
@@ -2773,6 +2743,7 @@ EXPORT_SYMBOL(__skb_bond_should_drop);
 static int __netif_receive_skb(struct sk_buff *skb)
 {
 	struct packet_type *ptype, *pt_prev;
+	struct packet_handler *phook;
 	struct net_device *orig_dev;
 	struct net_device *master;
 	struct net_device *null_or_orig;
@@ -2835,13 +2806,15 @@ static int __netif_receive_skb(struct sk
 		goto out;
 ncls:
 #endif
+	if (pt_prev)
+		ret = deliver_skb(skb, pt_prev, orig_dev);
 
-	skb = handle_bridge(skb, &pt_prev, &ret, orig_dev);
-	if (!skb)
-		goto out;
-	skb = handle_macvlan(skb, &pt_prev, &ret, orig_dev);
-	if (!skb)
-		goto out;
+	/* Process special hooks used for bridging and macvlan */
+	list_for_each_entry_rcu(phook, &packet_hook, list) {
+		skb = (*phook->func)(orig_dev, skb);
+		if (!skb)
+			goto out;
+	}
 
 	/*
 	 * Make sure frames received on VLAN interfaces stacked on
@@ -2856,6 +2829,7 @@ ncls:
 	}
 
 	type = skb->protocol;
+	pt_prev = NULL;
 	list_for_each_entry_rcu(ptype,
 			&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
 		if (ptype->type == type && (ptype->dev == null_or_orig ||
@@ -5855,6 +5829,7 @@ static int __init net_dev_init(void)
 	INIT_LIST_HEAD(&ptype_all);
 	for (i = 0; i < PTYPE_HASH_SIZE; i++)
 		INIT_LIST_HEAD(&ptype_base[i]);
+	INIT_LIST_HEAD(&packet_hook);
 
 	if (register_pernet_subsys(&netdev_net_ops))
 		goto out;

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: enh @ 2010-05-04 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: brian.haley, dlstevens, netdev, netdev-owner
In-Reply-To: <20100504.144647.157477097.davem@davemloft.net>

On Tue, May 4, 2010 at 14:46, David Miller <davem@davemloft.net> wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 04 May 2010 10:40:58 -0400
>
>> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
>> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
>>
>> Signed-off-by: Brian Haley <brian.haley@hp.com>
>
> In cast it wasn't clear from my other reply, I'm not applying this
> patch because I intentionally left this behavior there based upon
> some comments from Elliot in that this lets developers get the
> old default by asking for "-1" explicitly with a setsockopt.

(for the record, i don't need that behavior myself, and have no
opinion on whether or not it makes sense for you guys... i'll only
ever call setsockopt with 0 <= value <= 255. all i need is for the
default when i never call setsockopt to be 1. for now, i've added a
work-around where i explicitly call setsockopt with 1 when i create
the socket.)

-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/

^ permalink raw reply

* Re: [PATCH 1/1] net/usb: initiate sync sequence in sierra_net.c driver
From: Elina Pasheva @ 2010-05-04 22:15 UTC (permalink / raw)
  To: David Miller
  Cc: dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	Rory Filer, linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20100504.145357.111558943.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Tue, 2010-05-04 at 14:53 -0700, David Miller wrote:
> From: Elina Pasheva <epasheva-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8@public.gmane.org>
> Date: Tue, 4 May 2010 13:18:22 -0700
> 
> > What are the chances of this patch to be in the kernel-2.6.34 release?
> > Without this patch the driver sierra_net (currently in kernel-2.6.34-rc6) 
> > will not work.
> 
> It's sitting in my net-2.6 tree which means it will go to Linus the
> next time I send him a pull request.  I have no idea when that will
> be, but hopefully it will be soon.
> 
Sounds good. Thank you, David.
Elina

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/1] net/usb: initiate sync sequence in sierra_net.c driver
From: David Miller @ 2010-05-04 21:53 UTC (permalink / raw)
  To: epasheva-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8
  Cc: dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f,
	rfiler-usJ8hg9QsBLWpu6QEFMNjNBPR1lH4CV8,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1273004302.6718.10.camel@Linuxdev4-laptop>

From: Elina Pasheva <epasheva-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8@public.gmane.org>
Date: Tue, 4 May 2010 13:18:22 -0700

> What are the chances of this patch to be in the kernel-2.6.34 release?
> Without this patch the driver sierra_net (currently in kernel-2.6.34-rc6) 
> will not work.

It's sitting in my net-2.6 tree which means it will go to Linus the
next time I send him a pull request.  I have no idea when that will
be, but hopefully it will be soon.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04 21:46 UTC (permalink / raw)
  To: brian.haley; +Cc: dlstevens, enh, netdev, netdev-owner
In-Reply-To: <4BE031FA.6040006@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Tue, 04 May 2010 10:40:58 -0400

> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
> 
> Signed-off-by: Brian Haley <brian.haley@hp.com>

In cast it wasn't clear from my other reply, I'm not applying this
patch because I intentionally left this behavior there based upon
some comments from Elliot in that this lets developers get the
old default by asking for "-1" explicitly with a setsockopt.

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04 21:39 UTC (permalink / raw)
  To: dlstevens; +Cc: brian.haley, enh, netdev, netdev-owner
In-Reply-To: <OFD462DFA3.FF279B8C-ON88257719.0057B1F8-88257719.005914D6@us.ibm.com>

From: David Stevens <dlstevens@us.ibm.com>
Date: Tue, 4 May 2010 09:12:59 -0700

> But if the default is different for each destination or interface in
> the multicast case (ie, by adding conf settings for mcast), then
> it really should do late binding and leave it as "-1" in the set, right?
> That's what I thought it was already doing, but apparently not;
> I think it used to, but maybe I just didn't notice.

Unlike other people in this thread who sometimes aren't even checking
how the current code works, I checked all of the available source
control history in this area and this code has always behaved this way.

Right from day one.

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04 21:38 UTC (permalink / raw)
  To: brian.haley; +Cc: dlstevens, enh, netdev, netdev-owner
In-Reply-To: <4BE031FA.6040006@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Tue, 04 May 2010 10:40:58 -0400

> Not exactly.  It fixes the case where it's wrong by default, but
> the corner case of setting it to -1 via setsockopt() says:
> 
>     x == -1:       use kernel default
> 
> But that will revert back to the kernel using 64 on the next transmit.
> I can work on an update to this that makes a new mcast_hops per-interface
> setting and makes ip6_dst_hoplimit() aware of it.  Or even easier, just
> have setsockopt() trap the -1 and set np->mcast_hops to 1.  Built but
> untested patch below.

I thought that we agreed that when the user explicitly asks for "-1"
it should get the behavior right now, with is to use
ip6_dst_hoplimit()?  I think I even acknowledged when Elliot mentioned
this explicitly, and I think it's a good idea.

^ permalink raw reply

* 2.6.34-rc6-git2: Reported regressions from 2.6.33
From: Rafael J. Wysocki @ 2010-05-04 20:49 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Maciej Rutecki, Andrew Morton, Linus Torvalds,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Linux Wireless List, DRI

This message contains a list of some regressions from 2.6.33,
for which there are no fixes in the mainline known to the tracking team.
If any of them have been fixed already, please let us know.

If you know of any other unresolved regressions from 2.6.33, please let us
know either and we'll add them to the list.  Also, please let us know
if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply
to this message with CCs to the people involved in reporting and handling
the issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2010-05-04       76       26          22
  2010-04-20       64       35          34
  2010-04-07       48       35          33
  2010-03-21       15       13          10


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15880
Subject		: Very bad regression from 2.6.33 as of 1600f9def
Submitter	: Alex Elsayed <eternaleye@gmail.com>
Date		: 2010-04-29 2:28 (6 days old)
Message-ID	: <loom.20100429T041908-663@post.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=127250825306178&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15863
Subject		: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
Submitter	: Miles Lane <miles.lane@gmail.com>
Date		: 2010-04-27 0:51 (8 days old)
Message-ID	: <h2ya44ae5cd1004261751waa5cb65ei3d139cbcfa2cc5cf@mail.gmail.com>
References	: http://marc.info/?l=linux-kernel&m=127232949104878&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15862
Subject		: 2.6.34-rc4/5: iwlagn unusable until reload
Submitter	: Nico Schottelius <nico-linux-20100427@schottelius.org>
Date		: 2010-04-27 7:49 (8 days old)
Message-ID	: <20100427074934.GB3261@ikn.schottelius.org>
References	: http://marc.info/?l=linux-kernel&m=127235784004839&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15858
Subject		: [2.6.34-rc5] bad page state copying to/from HFS+ filesystem...
Submitter	: Daniel J Blueman <daniel.blueman@gmail.com>
Date		: 2010-04-25 21:14 (10 days old)
Message-ID	: <v2k6278d2221004251414kbbcc41baw78b86120d81dce7d@mail.gmail.com>
References	: http://marc.info/?l=linux-kernel&m=127223008621881&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15805
Subject		: reiserfs locking
Submitter	: Alexander Beregalov <a.beregalov@gmail.com>
Date		: 2010-04-15 21:02 (20 days old)
Message-ID	: <t2ka4423d671004151402n7b2dc425mdc9c6bb9640d63fb@mail.gmail.com>
References	: http://marc.info/?l=linux-kernel&m=127136535323933&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15790
Subject		: Meta-Bug: Regressions
Submitter	: Florian Mickler <fmickler@gmx.de>
Date		: 2010-04-15 18:21 (20 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15788
Subject		: external usb sound card doesn't work after resume
Submitter	: François Valenduc <francois.valenduc@tvcablenet.be>
Date		: 2010-04-15 10:16 (20 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15717
Subject		: bluetooth oops
Submitter	: Pavel Machek <pavel@ucw.cz>
Date		: 2010-03-14 20:14 (52 days old)
Message-ID	: <20100314201434.GE22059@elf.ucw.cz>
References	: http://marc.info/?l=linux-kernel&m=126859771528426&w=4
Handled-By	: Marcel Holtmann <marcel@holtmann.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15713
Subject		: hackbench regression due to commit 9dfc6e68bfe6e
Submitter	: Alex Shi <alex.shi@intel.com>
Date		: 2010-03-25 8:40 (41 days old)
First-Bad-Commit: http://git.kernel.org/linus/9dfc6e68bfe6ee452efb1a4e9ca26a9007f2b864
Message-ID	: <1269506457.4513.141.camel@alexs-hp.sh.intel.com>
References	: http://marc.info/?l=linux-kernel&m=126950632920682&w=4
Handled-By	: Christoph Lameter <cl@linux-foundation.org>
		  Pekka Enberg <penberg@cs.helsinki.fi>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15712
Subject		: [regression] 2.6.34-rc1 to -rc3 on zaurus: no longer boots
Submitter	: Pavel Machek <pavel@ucw.cz>
Date		: 2010-04-01 6:06 (34 days old)
Message-ID	: <20100401060624.GA1329@ucw.cz>
References	: http://marc.info/?l=linux-kernel&m=127010200817402&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15704
Subject		: [r8169] WARNING: at net/sched/sch_generic.c
Submitter	: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date		: 2010-03-31 10:21 (35 days old)
Message-ID	: <<20100331102142.GA3294@swordfish.minsk.epam.com>>
References	: http://marc.info/?l=linux-kernel&m=127003090406108&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15673
Subject		: 2.6.34-rc2: "ima_dec_counts: open/free imbalance"?
Submitter	: Thomas Meyer <thomas@m3y3r.de>
Date		: 2010-03-28 11:31 (38 days old)
Message-ID	: <1269775909.5301.4.camel@localhost.localdomain>
References	: http://marc.info/?l=linux-kernel&m=126977593326800&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15671
Subject		: intel graphic card hanging (Hangcheck timer elapsed... GPU hung)
Submitter	: Norbert Preining <preining@logic.at>
Date		: 2010-03-27 16:11 (39 days old)
Message-ID	: <20100327161104.GA12043@gamma.logic.tuwien.ac.at>
References	: http://marc.info/?l=linux-kernel&m=126970883105262&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15669
Subject		: INFO: suspicious rcu_dereference_check()
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2010-03-08 1:26 (58 days old)
Message-ID	: <c4e36d111003250348q678eb2e6w4f3e8133e7fd6e58@mail.gmail.com>
References	: http://marc.info/?l=linux-kernel&m=126801163107713&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15664
Subject		: Graphics hang and kernel backtrace when starting Azureus with Compiz enabled
Submitter	: Alex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec>
Date		: 2010-04-01 01:09 (34 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15659
Subject		: [Regresion] [2.6.34-rc1] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Submitter	: Maciej Rutecki <maciej.rutecki@gmail.com>
Date		: 2010-03-25 20:04 (41 days old)
Message-ID	: <<201003252104.24965.maciej.rutecki@gmail.com>>
References	: http://marc.info/?l=linux-kernel&m=126954749618319&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15625
Subject		: BUG: 2.6.34-rc1, RIP is (null)
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2010-03-18 22:22 (48 days old)
Message-ID	: <4BA2A7A9.4080503@oracle.com>
References	: http://marc.info/?l=linux-kernel&m=126895098217351&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15611
Subject		: Failure with the 2.6.34-rc1 kernel
Submitter	: Rupjyoti Sarmah <rsarmah@amcc.com>
Date		: 2010-03-16 15:45 (50 days old)
Message-ID	: <AC311A8E81420D4EBC1F26E6479848FE065B7D3D@SDCEXCHANGE01.ad.amcc.com>
References	: http://marc.info/?l=linux-kernel&m=126875435718396&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15610
Subject		: fsck leads to swapper - BUG: unable to handle kernel NULL pointer dereference & panic
Submitter	: Ozgur Yuksel <ozgur.yuksel@oracle.com>
Date		: 2010-03-22 15:59 (44 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15601
Subject		: [BUG] SLOB breaks Crypto
Submitter	: michael-dev@fami-braun.de
Date		: 2010-03-15 13:39 (51 days old)
Message-ID	: <4B9E38AF.70309@fami-braun.de>
References	: http://marc.info/?l=linux-kernel&m=126866044724539&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15590
Subject		: 2.6.34-rc1: regression: ^Z no longer stops sound
Submitter	: Pavel Machek <pavel@ucw.cz>
Date		: 2010-03-14 7:58 (52 days old)
Message-ID	: <20100314075831.GA13457@elf.ucw.cz>
References	: http://marc.info/?l=linux-kernel&m=126855353122623&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15589
Subject		: 2.6.34-rc1: Badness at fs/proc/generic.c:316
Submitter	: Christian Kujau <lists@nerdbynature.de>
Date		: 2010-03-13 23:53 (53 days old)
Message-ID	: <alpine.DEB.2.01.1003131544340.5493@bogon.housecafe.de>
References	: http://marc.info/?l=linux-kernel&m=126852442903680&w=2


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15893
Subject		: Alps: No edge-scrolling since 2.6.34-rc5
Submitter	: Felix Zweig <f.zweig@yahoo.de>
Date		: 2010-05-01 16:23 (4 days old)
Handled-By	:  Dan Carpenter <error27@gmail.com>
Patch		: https://bugzilla.kernel.org/attachment.cgi?id=26218


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15768
Subject		: Incorrectly calculated free blocks result in ENOSPC from writepage
Submitter	: Dmitry Monakhov <dmonakhov@openvz.org>
Date		: 2010-04-12 11:24 (23 days old)
Handled-By	: Dmitry Monakhov <dmonakhov@openvz.org>
Patch		: http://patchwork.ozlabs.org/patch/49989/


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15729
Subject		: BUG: physmap modprobe & rmmod
Submitter	: Randy Dunlap <randy.dunlap@oracle.com>
Date		: 2010-04-02 20:40 (33 days old)
Message-ID	: <20100402134058.c4682716.randy.dunlap@oracle.com>
References	: http://marc.info/?l=linux-kernel&m=127024096210230&w=2
Handled-By	: Hartley Sweeten <hsweeten@visionengravers.com>
Patch		: https://patchwork.kernel.org/patch/90497/


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15505
Subject		: No more b43 wireless interface since 2.6.34-rc1
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2010-03-10 06:59 (56 days old)
Handled-By	: Yinghai Lu <yinghai@kernel.org>
Patch		: https://bugzilla.kernel.org/show_bug.cgi?id=15505#c11


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.33,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=15310

Please let the tracking team know if there are any Bugzilla entries that
should be added to the list in there.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH 1/1] net/usb: initiate sync sequence in sierra_net.c driver
From: Elina Pasheva @ 2010-05-04 20:18 UTC (permalink / raw)
  To: David Miller, dbrownell@users.sourceforge.net
  Cc: epasheva, Rory Filer, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20100501.180829.139101312.davem@davemloft.net>

>> On Saturday, May 01, 2010 6:08 PM David Miller wrote:

>>>From: Elina Pasheva <epasheva@sierrawireless.com>
>>>Date: Wed, 28 Apr 2010 16:28:24 -0700

>>> Subject: [PATCH 1/1] net/usb: initiate sync sequence in sierra_net.c driver
>>> From: Elina Pasheva <epasheva@sierrawireless.com>
>>>
>>> The following patch adds the initiation of the sync sequence to
>>> "sierra_net_bind()". If this step is omitted, the modem will never sync up
>>> with the host and it will not be possible to establish a data connection.
>>> This is a high priority patch.
>>>
>>> This patch has been checked against net-2.6 tree.
>>> Signed-off-by: Elina Pasheva <epasheva@sierrawireless.com>
>>> Signed-off-by: Rory Filer <rfiler@sierrawireless.com>
>>> Tested-by: Elina Pasheva <epasheva@sierrawireless.com>

>>Applied.

>Thank you very much, David!
>Elina

Hi David,
What are the chances of this patch to be in the kernel-2.6.34 release?
Without this patch the driver sierra_net (currently in kernel-2.6.34-rc6) 
will not work.

Thank you,
Elina





^ permalink raw reply

* Re: virtio: put last_used and last_avail index into ring itself.
From: Michael S. Tsirkin @ 2010-05-04 18:22 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, virtualization, kvm, linux-kernel, mingo, linux-mm, akpm,
	hpa, gregory.haskins, s.hetze, Daniel Walker, Eric Dumazet
In-Reply-To: <200911091647.29655.rusty@rustcorp.com.au>

> virtio: put last_used and last_avail index into ring itself.
> 
> Generally, the other end of the virtio ring doesn't need to see where
> you're up to in consuming the ring.  However, to completely understand
> what's going on from the outside, this information must be exposed.
> For example, if you want to save and restore a virtio_ring, but you're
> not the consumer because the kernel is using it directly.
> 
> Fortunately, we have room to expand: the ring is always a whole number
> of pages and there's hundreds of bytes of padding after the avail ring
> and the used ring, whatever the number of descriptors (which must be a
> power of 2).
> 
> We add a feature bit so the guest can tell the host that it's writing
> out the current value there, if it wants to use that.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

I've been looking at this patch some more (more on why
later), and I wonder: would it be better to add some
alignment to the last used index address, so that
if we later add more stuff at the tail, it all
fits in a single cache line?

We use a new feature bit anyway, so layout change should not be
a problem.

Since I raised the question of caches: for used ring,
the ring is not aligned to 64 bit, so on CPUs with 64 bit
or larger cache lines, used entries will often cross
cache line boundaries. Am I right and might it
have been better to align ring entries to cache line boundaries?

What do you think?

> ---
>  drivers/virtio/virtio_ring.c |   23 +++++++++++++++--------
>  include/linux/virtio_ring.h  |   12 +++++++++++-
>  2 files changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -71,9 +71,6 @@ struct vring_virtqueue
>  	/* Number we've added since last sync. */
>  	unsigned int num_added;
>  
> -	/* Last used index we've seen. */
> -	u16 last_used_idx;
> -
>  	/* How to notify other side. FIXME: commonalize hcalls! */
>  	void (*notify)(struct virtqueue *vq);
>  
> @@ -278,12 +275,13 @@ static void detach_buf(struct vring_virt
>  
>  static inline bool more_used(const struct vring_virtqueue *vq)
>  {
> -	return vq->last_used_idx != vq->vring.used->idx;
> +	return vring_last_used(&vq->vring) != vq->vring.used->idx;
>  }
>  
>  static void *vring_get_buf(struct virtqueue *_vq, unsigned int *len)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
> +	struct vring_used_elem *u;
>  	void *ret;
>  	unsigned int i;
>  
> @@ -300,8 +298,11 @@ static void *vring_get_buf(struct virtqu
>  		return NULL;
>  	}
>  
> -	i = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].id;
> -	*len = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].len;
> +	u = &vq->vring.used->ring[vring_last_used(&vq->vring) % vq->vring.num];
> +	i = u->id;
> +	*len = u->len;
> +	/* Make sure we don't reload i after doing checks. */
> +	rmb();
>  
>  	if (unlikely(i >= vq->vring.num)) {
>  		BAD_RING(vq, "id %u out of range\n", i);
> @@ -315,7 +316,8 @@ static void *vring_get_buf(struct virtqu
>  	/* detach_buf clears data, so grab it now. */
>  	ret = vq->data[i];
>  	detach_buf(vq, i);
> -	vq->last_used_idx++;
> +	vring_last_used(&vq->vring)++;
> +
>  	END_USE(vq);
>  	return ret;
>  }
> @@ -402,7 +404,6 @@ struct virtqueue *vring_new_virtqueue(un
>  	vq->vq.name = name;
>  	vq->notify = notify;
>  	vq->broken = false;
> -	vq->last_used_idx = 0;
>  	vq->num_added = 0;
>  	list_add_tail(&vq->vq.list, &vdev->vqs);
>  #ifdef DEBUG
> @@ -413,6 +414,10 @@ struct virtqueue *vring_new_virtqueue(un
>  
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
>  
> +	/* We publish indices whether they offer it or not: if not, it's junk
> +	 * space anyway.  But calling this acknowledges the feature. */
> +	virtio_has_feature(vdev, VIRTIO_RING_F_PUBLISH_INDICES);
> +
>  	/* No callback?  Tell other side not to bother us. */
>  	if (!callback)
>  		vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
> @@ -443,6 +448,8 @@ void vring_transport_features(struct vir
>  		switch (i) {
>  		case VIRTIO_RING_F_INDIRECT_DESC:
>  			break;
> +		case VIRTIO_RING_F_PUBLISH_INDICES:
> +			break;
>  		default:
>  			/* We don't understand this bit. */
>  			clear_bit(i, vdev->features);
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -29,6 +29,9 @@
>  /* We support indirect buffer descriptors */
>  #define VIRTIO_RING_F_INDIRECT_DESC	28
>  
> +/* We publish our last-seen used index at the end of the avail ring. */
> +#define VIRTIO_RING_F_PUBLISH_INDICES	29
> +
>  /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
>  struct vring_desc
>  {
> @@ -87,6 +90,7 @@ struct vring {
>   *	__u16 avail_flags;
>   *	__u16 avail_idx;
>   *	__u16 available[num];
> + *	__u16 last_used_idx;
>   *
>   *	// Padding to the next align boundary.
>   *	char pad[];
> @@ -95,6 +99,7 @@ struct vring {
>   *	__u16 used_flags;
>   *	__u16 used_idx;
>   *	struct vring_used_elem used[num];
> + *	__u16 last_avail_idx;
>   * };
>   */
>  static inline void vring_init(struct vring *vr, unsigned int num, void *p,
> @@ -111,9 +116,14 @@ static inline unsigned vring_size(unsign
>  {
>  	return ((sizeof(struct vring_desc) * num + sizeof(__u16) * (2 + num)
>  		 + align - 1) & ~(align - 1))
> -		+ sizeof(__u16) * 2 + sizeof(struct vring_used_elem) * num;
> +		+ sizeof(__u16) * 2 + sizeof(struct vring_used_elem) * num + 2;
>  }
>  
> +/* We publish the last-seen used index at the end of the available ring, and
> + * vice-versa.  These are at the end for backwards compatibility. */
> +#define vring_last_used(vr) ((vr)->avail->ring[(vr)->num])
> +#define vring_last_avail(vr) (*(__u16 *)&(vr)->used->ring[(vr)->num])
> +
>  #ifdef __KERNEL__
>  #include <linux/irqreturn.h>
>  struct virtio_device;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Stephen Hemminger @ 2010-05-04 17:13 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Ben Hutchings, netdev
In-Reply-To: <l2k571fb4001005041008k9b129a06vf5e40db2d119434c@mail.gmail.com>

On Tue, 4 May 2010 22:38:49 +0530
Bhaskar Dutta <bhaskie@gmail.com> wrote:

> On Tue, May 4, 2010 at 9:42 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > On Tue, 4 May 2010 19:58:32 +0530
> > Bhaskar Dutta <bhaskie@gmail.com> wrote:
> >
> >> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> >> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
> >> >> Hi,
> >> >>
> >> >> I am observing intermittent TCP-MD5 checksum failures
> >> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
> >> >>
> >> >> The problem is only seen in multi-core 64 bit machines.
> >> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
> >> >> that the percpu allocator has been re-implemented in 2.6.33) that
> >> >> might cause a corruption in 64 bit SMP machines?
> >> >>
> >> >> Any pointers would be appreciated.
> >> >
> >> > There was another recent report of incorrect MD5 signatures in
> >> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
> >> > response.
> >> >
> >> > Ben.
> >> >
> >>
> >> I found another thread posted back in Jan 2007 with a similar bug
> >> (x86_64 on 2.6.20) but no replies to that as well.
> >> http://lkml.org/lkml/2007/1/20/56
> >
> > 2.6.20 had lots of other MD5 bugs. Your problem might be related to
> > GRO.  MD5 may not handle multi-fragment packets.
> > --
> 
> I am getting the issue on 2.6.31 and 2.6.28 (gro infrastructure was
> added in 2.6.29).
> Also, both segmentation offloading as well as receive offloading
> (gso/gro) are turned off.
> 
> Moreover outgoing TCP packets are the ones with the corrupt checksums.
> Both tcpdump on my local machine and the BGP router on the other side
> complain of the bad checksums with the same packet.
> 
> I am trying to figure out if there is something in the per-cpu
> implementation that might be causing a corruption (SMP and x86_64) but
> I am not really getting anywhere.

I seriously doubt the per-cpu stuff is the issue.

> I am trying to reproduce the bad checksums with the latest kernel
> sources since it has a new implementation of the percpu allocator.

First turn off all offload settings on the device (TSO,GSO,SG,CSUM)
then check that size of the bad packets. Are they fragmented or
just simple linear packets?



-- 

^ permalink raw reply

* [PATCH net-next-2.6] net: __alloc_skb() speedup
From: Eric Dumazet @ 2010-05-04 17:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jamal, Tom Herbert

With following patch I can reach maximum rate of my pktgen+udpsink
simulator :
- 'old' machine : dual quad core E5450  @3.00GHz
- 64 UDP rx flows (only differ by destination port)
- RPS enabled, NIC interrupts serviced on cpu0
- rps dispatched on 7 other cores. (~130.000 IPI per second)
- SLAB allocator (faster than SLUB in this workload)
- tg3 NIC
- 1.080.000 pps without a single drop at NIC level.

Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
first sk_buff cache line, the second to prefetch the shinfo part.

Also using one memset() to initialize all skb_shared_info fields instead
of one by one to reduce number of instructions, using long word moves.

All skb_shared_info fields before 'dataref' are cleared in 
__alloc_skb().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/skbuff.h |    7 ++++++-
 net/core/skbuff.c      |   21 +++++----------------
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 746a652..f32ccc9 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -187,7 +187,6 @@ union skb_shared_tx {
  * the end of the header data, ie. at skb->end.
  */
 struct skb_shared_info {
-	atomic_t	dataref;
 	unsigned short	nr_frags;
 	unsigned short	gso_size;
 	/* Warning: this field is not always filled in (UFO)! */
@@ -197,6 +196,12 @@ struct skb_shared_info {
 	union skb_shared_tx tx_flags;
 	struct sk_buff	*frag_list;
 	struct skb_shared_hwtstamps hwtstamps;
+
+	/*
+	 * Warning : all fields before dataref are cleared in __alloc_skb()
+	 */
+	atomic_t	dataref;
+	
 	skb_frag_t	frags[MAX_SKB_FRAGS];
 	/* Intermediate layers must ensure that destructor_arg
 	 * remains valid until skb destructor */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8b9c109..a9b0e1f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -181,12 +181,14 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
 	if (!skb)
 		goto out;
+	prefetchw(skb);
 
 	size = SKB_DATA_ALIGN(size);
 	data = kmalloc_node_track_caller(size + sizeof(struct skb_shared_info),
 			gfp_mask, node);
 	if (!data)
 		goto nodata;
+	prefetchw(data + size);
 
 	/*
 	 * Only clear those fields we need to clear, not those that we will
@@ -208,15 +210,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 	/* make sure we initialize shinfo sequentially */
 	shinfo = skb_shinfo(skb);
+	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
 	atomic_set(&shinfo->dataref, 1);
-	shinfo->nr_frags  = 0;
-	shinfo->gso_size = 0;
-	shinfo->gso_segs = 0;
-	shinfo->gso_type = 0;
-	shinfo->ip6_frag_id = 0;
-	shinfo->tx_flags.flags = 0;
-	skb_frag_list_init(skb);
-	memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps));
 
 	if (fclone) {
 		struct sk_buff *child = skb + 1;
@@ -505,16 +500,10 @@ int skb_recycle_check(struct sk_buff *skb, int skb_size)
 		return 0;
 
 	skb_release_head_state(skb);
+
 	shinfo = skb_shinfo(skb);
+	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
 	atomic_set(&shinfo->dataref, 1);
-	shinfo->nr_frags = 0;
-	shinfo->gso_size = 0;
-	shinfo->gso_segs = 0;
-	shinfo->gso_type = 0;
-	shinfo->ip6_frag_id = 0;
-	shinfo->tx_flags.flags = 0;
-	skb_frag_list_init(skb);
-	memset(&shinfo->hwtstamps, 0, sizeof(shinfo->hwtstamps));
 
 	memset(skb, 0, offsetof(struct sk_buff, tail));
 	skb->data = skb->head + NET_SKB_PAD;



^ permalink raw reply related

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Bhaskar Dutta @ 2010-05-04 17:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Hutchings, netdev
In-Reply-To: <20100504091215.5a4a51f4@nehalam>

On Tue, May 4, 2010 at 9:42 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Tue, 4 May 2010 19:58:32 +0530
> Bhaskar Dutta <bhaskie@gmail.com> wrote:
>
>> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
>> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
>> >> Hi,
>> >>
>> >> I am observing intermittent TCP-MD5 checksum failures
>> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
>> >>
>> >> The problem is only seen in multi-core 64 bit machines.
>> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
>> >> that the percpu allocator has been re-implemented in 2.6.33) that
>> >> might cause a corruption in 64 bit SMP machines?
>> >>
>> >> Any pointers would be appreciated.
>> >
>> > There was another recent report of incorrect MD5 signatures in
>> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
>> > response.
>> >
>> > Ben.
>> >
>>
>> I found another thread posted back in Jan 2007 with a similar bug
>> (x86_64 on 2.6.20) but no replies to that as well.
>> http://lkml.org/lkml/2007/1/20/56
>
> 2.6.20 had lots of other MD5 bugs. Your problem might be related to
> GRO.  MD5 may not handle multi-fragment packets.
> --

I am getting the issue on 2.6.31 and 2.6.28 (gro infrastructure was
added in 2.6.29).
Also, both segmentation offloading as well as receive offloading
(gso/gro) are turned off.

Moreover outgoing TCP packets are the ones with the corrupt checksums.
Both tcpdump on my local machine and the BGP router on the other side
complain of the bad checksums with the same packet.

I am trying to figure out if there is something in the per-cpu
implementation that might be causing a corruption (SMP and x86_64) but
I am not really getting anywhere.

I am trying to reproduce the bad checksums with the latest kernel
sources since it has a new implementation of the percpu allocator.

Any pointers would be highly appreciated!

Thanks,
Bhaskar

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Stevens @ 2010-05-04 17:05 UTC (permalink / raw)
  To: Brian Haley; +Cc: David Miller, enh, netdev, netdev-owner
In-Reply-To: <4BE04E99.2080903@hp.com>

 > Yes, that would be the ideal fix, and give the admin more control over
> the value, but it seems like overkill to me.  It's been 64 for a while,
> and it's always been changeable by apps.  I guess the only thing to
> think about is there could be an app that works because it being 64
> today, but will break tomorrow.  Having a tunable parameter will let
> you get the app working without re-writing it.

        Well, it should've been 1, and any app relying on having
multicast routing really should set it explicitly.
        I think per-interface defaulting to 1 should be ok. I'd
prefer carrying the "-1" so apps that set it get what they want
and apps that don't carry the current default, rather than the
value at the time the socket was created, but practically it probably
doesn't matter. In reality, apps that need more than one will
already be setting it to a non-default value.

                                                        +-DLS


^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: Brian Haley @ 2010-05-04 16:43 UTC (permalink / raw)
  To: David Stevens; +Cc: David Miller, enh, netdev, netdev-owner
In-Reply-To: <OFD462DFA3.FF279B8C-ON88257719.0057B1F8-88257719.005914D6@us.ibm.com>

David Stevens wrote:
> I think the original code was intending to do late binding -- carry "-1" 
> as
> meaning "not set by user" and use the default value _at_the_time_of_
> _the_send_, and in its context. For that to have worked, the checks for
> "<0" in the send paths should've checked for multicast and used the
> multicast default as you're saying, Brian. And doing that not on the
> set, but when generating packets, is what I would've expected.

Right, we could do it that way, but then how far do we unravel the thread?
Unicast hoplimit is settable in the route, do we add a mcast_hops there
too, in addition to the per-interface tunable?  I think just having it
the recommended default is good enough here, until someone shows they
have the need to do more.

> I don't see anything that's broken by changing it to use the default at
> the time of the set since for mcast the default is really a constant,
> and in fact, it looks like in addition to not actually using the default 
> of 1,
> it was returning "-1" in the cmsg when not set by the user (and it, too,
> should've been "1", which it would return now).
> 
> But if the default is different for each destination or interface in
> the multicast case (ie, by adding conf settings for mcast), then
> it really should do late binding and leave it as "-1" in the set, right?
> That's what I thought it was already doing, but apparently not;
> I think it used to, but maybe I just didn't notice.

Yes, that would be the ideal fix, and give the admin more control over
the value, but it seems like overkill to me.  It's been 64 for a while,
and it's always been changeable by apps.  I guess the only thing to
think about is there could be an app that works because it being 64
today, but will break tomorrow.  Having a tunable parameter will let
you get the app working without re-writing it.

-Brian

^ permalink raw reply

* [PATCH 3/3] cxgb4i: iscsi and common pdu processing library
From: Rakesh Ranjan @ 2010-05-04 16:43 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, kxie-ut6Up61K2wZBDgjK7y7TUQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk,
	michaelc-hcNo3dDEHLuVc3sceRu5cw, Rakesh Ranjan
In-Reply-To: <1272991388-8527-3-git-send-email-rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

From: Rakesh Ranjan <rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>


Signed-off-by: Rakesh Ranjan <rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
 drivers/scsi/cxgb4i/cxgb4i_iscsi.c |  715 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/cxgb4i/libcxgbi.c     |  464 +++++++++++++++++++++++
 drivers/scsi/cxgb4i/libcxgbi.h     |   61 +++
 3 files changed, 1240 insertions(+), 0 deletions(-)
 create mode 100644 drivers/scsi/cxgb4i/cxgb4i_iscsi.c
 create mode 100644 drivers/scsi/cxgb4i/libcxgbi.c
 create mode 100644 drivers/scsi/cxgb4i/libcxgbi.h

diff --git a/drivers/scsi/cxgb4i/cxgb4i_iscsi.c b/drivers/scsi/cxgb4i/cxgb4i_iscsi.c
new file mode 100644
index 0000000..c76fed2
--- /dev/null
+++ b/drivers/scsi/cxgb4i/cxgb4i_iscsi.c
@@ -0,0 +1,715 @@
+/*
+ * cxgb4i_iscsi.c: Chelsio T4 iSCSI driver.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#include <linux/inet.h>
+#include <linux/crypto.h>
+#include <linux/if_vlan.h>
+#include <net/dst.h>
+#include <net/tcp.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_eh.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi.h>
+#include <scsi/iscsi_proto.h>
+#include <scsi/libiscsi.h>
+#include <scsi/scsi_transport_iscsi.h>
+
+#include "cxgb4i.h"
+
+/*
+ * align pdu size to multiple of 512 for better performance
+ */
+#define cxgb4i_align_pdu_size(n) do { n = (n) & (~511); } while (0)
+
+static struct scsi_transport_template *cxgb4i_scsi_transport;
+static struct scsi_host_template cxgb4i_host_template;
+static struct iscsi_transport cxgb4i_iscsi_transport;
+
+struct cxgb4i_hba *cxgb4i_hba_add(struct cxgb4i_snic *snic,
+					struct net_device *dev)
+{
+	struct cxgb4i_hba *chba;
+	struct Scsi_Host *shost;
+	int err;
+
+	shost = iscsi_host_alloc(&cxgb4i_host_template, sizeof(*chba), 1);
+
+	if (!shost) {
+		cxgb4i_log_info("snic 0x%p, ndev 0x%p, host alloc failed\n",
+				snic, dev);
+		return NULL;
+	}
+
+	shost->transportt = cxgb4i_scsi_transport;
+	shost->max_lun = CXGB4I_MAX_LUN;
+	shost->max_id = CXGB4I_MAX_TARGET;
+	shost->max_channel = 0;
+	shost->max_cmd_len = 16;
+
+	chba = iscsi_host_priv(shost);
+	chba->snic = snic;
+	chba->ndev = dev;
+	chba->shost = shost;
+
+	pci_dev_get(snic->lldi.pdev);
+	err = iscsi_host_add(shost, &snic->lldi.pdev->dev);
+	if (err) {
+		cxgb4i_log_info("snic 0x%p, dev 0x%p, host add failed\n",
+				snic, dev);
+		goto pci_dev_put;
+	}
+
+	return chba;
+
+pci_dev_put:
+	pci_dev_put(snic->lldi.pdev);
+	scsi_host_put(shost);
+	return NULL;
+}
+
+void cxgb4i_hba_remove(struct cxgb4i_hba *chba)
+{
+	iscsi_host_remove(chba->shost);
+	pci_dev_put(chba->snic->lldi.pdev);
+	iscsi_host_free(chba->shost);
+}
+
+static struct iscsi_endpoint *cxgb4i_ep_connect(struct Scsi_Host *shost,
+						struct sockaddr *dst_addr,
+						int non_blocking)
+{
+	struct iscsi_endpoint *iep;
+	struct cxgb4i_endpoint *cep;
+	struct cxgb4i_hba *hba = NULL;
+	struct cxgb4i_sock *csk = NULL;
+	struct cxgb4i_snic *snic;
+	int err = 0;
+
+	if (shost)
+		hba = iscsi_host_priv(shost);
+
+	snic = cxgb4i_find_snic(hba ? hba->ndev : NULL,
+			((struct sockaddr_in *)dst_addr)->sin_addr.s_addr);
+	if (!snic) {
+		cxgb4i_log_info("ep connect no snic\n");
+		err = -ENOSPC;
+		goto release_conn;
+	}
+
+	csk = cxgb4i_sock_create(snic);
+	if (!csk) {
+		cxgb4i_log_info("ep connect OOM\n");
+		err = -ENOMEM;
+		goto release_conn;
+	}
+	err = cxgb4i_sock_connect(hba ? hba->ndev : NULL, csk,
+				(struct sockaddr_in *)dst_addr);
+	if (err < 0) {
+		cxgb4i_log_info("ep connect failed\n");
+		goto release_conn;
+	}
+
+	hba = cxgb4i_hba_find_by_netdev(csk->dst->dev);
+	if (!hba) {
+		err = -ENOSPC;
+		cxgb4i_log_info("Not going through cxgb4i device\n");
+		goto release_conn;
+	}
+
+	if (shost && hba != iscsi_host_priv(shost)) {
+		err = -ENOSPC;
+		cxgb4i_log_info("Could not connect through request host %u\n",
+				shost->host_no);
+		goto release_conn;
+	}
+
+	if (cxgb4i_sock_is_closing(csk)) {
+		err = -ENOSPC;
+		cxgb4i_log_info("ep connect unable to connect\n");
+		goto release_conn;
+	}
+
+	iep = iscsi_create_endpoint(sizeof(*cep));
+	if (!iep) {
+		err = -ENOMEM;
+		cxgb4i_log_info("iscsi alloc ep, OOM\n");
+		goto release_conn;
+	}
+
+	cep = iep->dd_data;
+	cep->csk = csk;
+	cep->chba = hba;
+
+	cxgb4i_api_debug("iep 0x%p, cep 0x%p, csk 0x%p, hba 0x%p\n",
+			iep, cep, csk, hba);
+
+	return iep;
+
+release_conn:
+	cxgb4i_api_debug("conn 0x%p failed, release\n", csk);
+	if (csk)
+		cxgb4i_sock_release(csk);
+
+	return ERR_PTR(err);
+}
+
+static int cxgb4i_ep_poll(struct iscsi_endpoint *ep, int timeout_ms)
+{
+	struct cxgb4i_endpoint *cep = ep->dd_data;
+	struct cxgb4i_sock *csk = cep->csk;
+
+	if (!cxgb4i_sock_is_established(csk))
+		return 0;
+
+	return 1;
+}
+
+static void cxgb4i_ep_disconnect(struct iscsi_endpoint *ep)
+{
+	struct cxgb4i_endpoint *cep = ep->dd_data;
+	struct cxgb4i_conn *cconn = cep->cconn;
+
+	if (cconn && cconn->iconn) {
+		iscsi_suspend_tx(cconn->iconn);
+
+		write_lock_bh(&cep->csk->callback_lock);
+		cep->csk->user_data = NULL;
+		cconn->cep = NULL;
+		write_unlock_bh(&cep->csk->callback_lock);
+	}
+
+	cxgb4i_sock_release(cep->csk);
+	iscsi_destroy_endpoint(ep);
+}
+
+static struct iscsi_cls_session *
+cxgb4i_create_session(struct iscsi_endpoint *ep, u16 cmds_max, u16 qdepth,
+							u32 initial_cmdsn)
+{
+	struct cxgb4i_endpoint *cep;
+	struct cxgb4i_hba *chba;
+	struct Scsi_Host *shost;
+	struct iscsi_cls_session *cls_session;
+	struct iscsi_session *session;
+
+	if (!ep) {
+		cxgb4i_log_error("missing endpoint\n");
+		return NULL;
+	}
+
+	cep = ep->dd_data;
+	chba = cep->chba;
+	shost = chba->shost;
+
+	BUG_ON(chba != iscsi_host_priv(shost));
+
+	cls_session = iscsi_session_setup(&cxgb4i_iscsi_transport, shost,
+					cmds_max, 0,
+					sizeof(struct iscsi_tcp_task) +
+					sizeof(struct cxgb4i_task_data),
+					initial_cmdsn, ISCSI_MAX_TARGET);
+	if (!cls_session)
+		return NULL;
+
+	session = cls_session->dd_data;
+	if (iscsi_tcp_r2tpool_alloc(session))
+		goto remove_session;
+
+	return cls_session;
+
+remove_session:
+	iscsi_session_teardown(cls_session);
+	return NULL;
+}
+
+static void cxgb4i_destroy_session(struct iscsi_cls_session *cls_session)
+{
+	iscsi_tcp_r2tpool_free(cls_session->dd_data);
+	iscsi_session_teardown(cls_session);
+}
+
+static inline int cxgb4i_conn_max_xmit_dlength(struct iscsi_conn *conn)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	unsigned int max = max(512 * MAX_SKB_FRAGS, SKB_TX_HEADROOM);
+
+	max = min(cconn->chba->snic->tx_max_size, max);
+	if (conn->max_xmit_dlength)
+		conn->max_xmit_dlength = min(conn->max_xmit_dlength, max);
+	else
+		conn->max_xmit_dlength = max;
+	cxgb4i_align_pdu_size(conn->max_xmit_dlength);
+	return 0;
+}
+
+static inline int cxgb4i_conn_max_recv_dlength(struct iscsi_conn *conn)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	unsigned int max = cconn->chba->snic->rx_max_size;
+
+	cxgb4i_align_pdu_size(max);
+
+	if (conn->max_recv_dlength) {
+		if (conn->max_recv_dlength > max) {
+			cxgb4i_log_error("MaxRecvDataSegmentLength %u too big."
+					" Need to be <= %u.\n",
+					conn->max_recv_dlength, max);
+			return -EINVAL;
+		}
+		conn->max_recv_dlength = min(conn->max_recv_dlength, max);
+		cxgb4i_align_pdu_size(conn->max_recv_dlength);
+	} else
+		conn->max_recv_dlength = max;
+
+	return 0;
+}
+
+static struct iscsi_cls_conn *
+cxgb4i_create_conn(struct iscsi_cls_session *cls_session, u32 cid)
+{
+	struct iscsi_cls_conn *cls_conn;
+	struct iscsi_conn *conn;
+	struct iscsi_tcp_conn *tcp_conn;
+	struct cxgb4i_conn *cconn;
+
+	cls_conn = iscsi_tcp_conn_setup(cls_session, sizeof(*cconn), cid);
+	if (!cls_conn)
+		return NULL;
+
+	conn = cls_conn->dd_data;
+	tcp_conn = conn->dd_data;
+	cconn = tcp_conn->dd_data;
+
+	cconn->iconn = conn;
+	return cls_conn;
+}
+
+static int cxgb4i_bind_conn(struct iscsi_cls_session *cls_session,
+				struct iscsi_cls_conn *cls_conn,
+				u64 transport_eph, int is_leading)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	struct cxgb4i_snic *snic;
+	struct iscsi_endpoint *ep;
+	struct cxgb4i_endpoint *cep;
+	struct cxgb4i_sock *csk;
+	int err;
+
+	ep = iscsi_lookup_endpoint(transport_eph);
+	if (!ep)
+		return -EINVAL;
+
+	/*  setup ddp pagesize */
+	cep = ep->dd_data;
+	csk = cep->csk;
+	snic = cep->chba->snic;
+	err = cxgb4i_ddp_setup_conn_host_pagesize(csk, csk->hwtid, 0);
+	if (err < 0)
+		return err;
+
+	err = iscsi_conn_bind(cls_session, cls_conn, is_leading);
+	if (err)
+		return -EINVAL;
+
+	/*  calculate the tag idx bits needed for this conn based on cmds_max */
+	cconn->task_idx_bits = (__ilog2_u32(conn->session->cmds_max - 1)) + 1;
+
+	read_lock(&csk->callback_lock);
+	csk->user_data = conn;
+	cconn->chba = cep->chba;
+	cconn->cep = cep;
+	cep->cconn = cconn;
+	read_unlock(&csk->callback_lock);
+
+	cxgb4i_conn_max_xmit_dlength(conn);
+	cxgb4i_conn_max_recv_dlength(conn);
+
+	spin_lock_bh(&conn->session->lock);
+	sprintf(conn->portal_address, "%pI4", &csk->daddr.sin_addr.s_addr);
+	conn->portal_port = ntohs(csk->daddr.sin_port);
+	spin_unlock_bh(&conn->session->lock);
+
+	/*  init recv engine */
+	iscsi_tcp_hdr_recv_prep(tcp_conn);
+
+	return 0;
+}
+
+static int
+cxgb4i_get_conn_param(struct iscsi_cls_conn *cls_conn,
+			enum iscsi_param param, char *buff)
+{
+	struct iscsi_conn *iconn = cls_conn->dd_data;
+	int len;
+
+	switch (param) {
+	case ISCSI_PARAM_CONN_PORT:
+		spin_lock_bh(&iconn->session->lock);
+		len = sprintf(buff, "%hu\n", iconn->portal_port);
+		spin_unlock_bh(&iconn->session->lock);
+		break;
+	case ISCSI_PARAM_CONN_ADDRESS:
+		spin_lock_bh(&iconn->session->lock);
+		len = sprintf(buff, "%s\n", iconn->portal_address);
+		spin_unlock_bh(&iconn->session->lock);
+		break;
+	default:
+		return iscsi_conn_get_param(cls_conn, param, buff);
+	}
+	return len;
+}
+
+static int
+cxgb4i_set_conn_param(struct iscsi_cls_conn *cls_conn,
+			enum iscsi_param param, char *buf, int buflen)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+	struct iscsi_session *session = conn->session;
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	struct cxgb4i_sock *csk = cconn->cep->csk;
+	int value, err = 0;
+
+	switch (param) {
+	case ISCSI_PARAM_HDRDGST_EN:
+		err = iscsi_set_param(cls_conn, param, buf, buflen);
+		if (!err && conn->hdrdgst_en)
+			err = cxgb4i_ddp_setup_conn_digest(csk, csk->hwtid,
+							conn->hdrdgst_en,
+							conn->datadgst_en, 0);
+		break;
+	case ISCSI_PARAM_DATADGST_EN:
+		err = iscsi_set_param(cls_conn, param, buf, buflen);
+		if (!err && conn->datadgst_en)
+			err = cxgb4i_ddp_setup_conn_digest(csk, csk->hwtid,
+							conn->hdrdgst_en,
+							conn->datadgst_en, 0);
+		break;
+	case ISCSI_PARAM_MAX_R2T:
+		sscanf(buf, "%d", &value);
+		if (value <= 0 || !is_power_of_2(value))
+			return -EINVAL;
+		if (session->max_r2t == value)
+			break;
+		iscsi_tcp_r2tpool_free(session);
+		err = iscsi_set_param(cls_conn, param, buf, buflen);
+		if (!err && iscsi_tcp_r2tpool_alloc(session))
+			return -ENOMEM;
+	case ISCSI_PARAM_MAX_RECV_DLENGTH:
+		err = iscsi_set_param(cls_conn, param, buf, buflen);
+		if (!err)
+			err = cxgb4i_conn_max_recv_dlength(conn);
+		break;
+	case ISCSI_PARAM_MAX_XMIT_DLENGTH:
+		err = iscsi_set_param(cls_conn, param, buf, buflen);
+		if (!err)
+			err = cxgb4i_conn_max_xmit_dlength(conn);
+		break;
+	default:
+		return iscsi_set_param(cls_conn, param, buf, buflen);
+	}
+	return err;
+}
+
+static int
+cxgb4i_set_host_param(struct Scsi_Host *shost,
+			enum iscsi_host_param param, char *buff, int buflen)
+{
+	struct cxgb4i_hba *chba = iscsi_host_priv(shost);
+
+	if (!chba->ndev) {
+		shost_printk(KERN_ERR, shost, "Could not set host param. "
+				"Netdev for host not set\n");
+		return -ENODEV;
+	}
+
+	cxgb4i_api_debug("param %d, buff %s\n", param, buff);
+
+	switch (param) {
+	case ISCSI_HOST_PARAM_IPADDRESS:
+	{
+		__be32 addr = in_aton(buff);
+		cxgb4i_set_iscsi_ipv4(chba, addr);
+		return 0;
+	}
+
+	case ISCSI_HOST_PARAM_HWADDRESS:
+	case ISCSI_HOST_PARAM_NETDEV_NAME:
+		return 0;
+
+	default:
+		return iscsi_host_set_param(shost, param, buff, buflen);
+	}
+}
+
+static int
+cxgb4i_get_host_param(struct Scsi_Host *shost,
+			enum iscsi_host_param param, char *buff)
+{
+	struct cxgb4i_hba *chba = iscsi_host_priv(shost);
+	int len = 0;
+
+	if (!chba->ndev) {
+		shost_printk(KERN_ERR, shost, "Could not set host param. "
+				"Netdev for host not set\n");
+		return -ENODEV;
+	}
+
+	cxgb4i_api_debug("hba %s, param %d\n", chba->ndev->name, param);
+
+	switch (param) {
+	case ISCSI_HOST_PARAM_HWADDRESS:
+		len = sysfs_format_mac(buff, chba->ndev->dev_addr, 6);
+		break;
+	case ISCSI_HOST_PARAM_NETDEV_NAME:
+		len = sprintf(buff, "%s\n", chba->ndev->name);
+		break;
+	case ISCSI_HOST_PARAM_IPADDRESS:
+	{
+		__be32 addr;
+
+		addr = cxgb4i_get_iscsi_ipv4(chba);
+		len = sprintf(buff, "%pI4", &addr);
+		break;
+	}
+	default:
+		return iscsi_host_get_param(shost, param, buff);
+	}
+
+	return len;
+}
+
+static void cxgb4i_get_conn_stats(struct iscsi_cls_conn *cls_conn,
+				struct iscsi_stats *stats)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+
+	stats->txdata_octets = conn->txdata_octets;
+	stats->rxdata_octets = conn->rxdata_octets;
+	stats->scsicmd_pdus = conn->scsicmd_pdus_cnt;
+	stats->dataout_pdus = conn->dataout_pdus_cnt;
+	stats->scsirsp_pdus = conn->scsirsp_pdus_cnt;
+	stats->datain_pdus = conn->datain_pdus_cnt;
+	stats->r2t_pdus = conn->r2t_pdus_cnt;
+	stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt;
+	stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt;
+	stats->digest_err = 0;
+	stats->timeout_err = 0;
+	stats->custom_length = 1;
+	strcpy(stats->custom[0].desc, "eh_abort_cnt");
+	stats->custom[0].value = conn->eh_abort_cnt;
+}
+
+static void cxgb4i_parse_pdu_itt(struct iscsi_conn *conn, itt_t itt,
+							int *idx, int *age)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	struct cxgb4i_snic *snic = cconn->chba->snic;
+	u32 tag = ntohl((__force u32) itt);
+	u32 sw_bits;
+
+	sw_bits = cxgb4i_tag_nonrsvd_bits(&snic->tag_format, tag);
+	if (idx)
+		*idx = sw_bits & ((1 << cconn->task_idx_bits) - 1);
+	if (age)
+		*age = (sw_bits >> cconn->task_idx_bits) & ISCSI_AGE_MASK;
+
+	cxgb4i_tag_debug("parse tag 0x%x/0x%x, sw 0x%x, itt 0x%x, age 0x%x.\n",
+			tag, itt, sw_bits, idx ? *idx : 0xFFFFF,
+			age ? *age : 0xFF);
+}
+
+int cxgb4i_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt)
+{
+	struct scsi_cmnd *sc = task->sc;
+	struct iscsi_conn *conn = task->conn;
+	struct iscsi_session *sess = conn->session;
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	struct cxgb4i_snic *snic = cconn->chba->snic;
+	struct cxgb4i_tag_format *tformat = &snic->tag_format;
+	u32 sw_tag = (sess->age << cconn->task_idx_bits) | task->itt;
+	u32 tag;
+	int err = -EINVAL;
+
+	if (sc &&
+		(scsi_bidi_cmnd(sc) ||
+		 sc->sc_data_direction == DMA_FROM_DEVICE) &&
+			cxgb4i_sw_tag_usable(tformat, sw_tag)) {
+
+		struct cxgb4i_sock *csk = cconn->cep->csk;
+		struct cxgb4i_gather_list *gl;
+
+		gl = cxgb4i_ddp_make_gl(scsi_in(sc)->length,
+					scsi_in(sc)->table.sgl,
+					scsi_in(sc)->table.nents,
+					snic->lldi.pdev,
+					GFP_ATOMIC);
+		if (gl) {
+			tag = sw_tag;
+			err = cxgb4i_ddp_tag_reserve(snic, csk->hwtid,
+							tformat, &tag,
+							gl, GFP_ATOMIC);
+			if (err < 0)
+				cxgb4i_ddp_release_gl(gl, snic->lldi.pdev);
+		}
+	}
+	if (err < 0)
+		tag = cxgb4i_set_non_ddp_tag(tformat, sw_tag);
+	/*  the itt need to sent in big-endian order */
+	*hdr_itt = (__force itt_t)htonl(tag);
+
+	cxgb4i_tag_debug("new sc 0x%p tag 0x%x/0x%x (itt 0x%x, age 0x%x).\n",
+			sc, tag, *hdr_itt, task->itt, sess->age);
+	return 0;
+}
+
+void cxgb4i_release_itt(struct iscsi_task *task, itt_t hdr_itt)
+{
+	struct scsi_cmnd *sc = task->sc;
+	struct iscsi_tcp_conn *tcp_conn = task->conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	struct cxgb4i_snic *snic = cconn->chba->snic;
+	struct cxgb4i_tag_format *tformat = &snic->tag_format;
+	u32 tag = ntohl((__force u32)hdr_itt);
+
+	cxgb4i_tag_debug("release tag 0x%x.\n", tag);
+
+	if (sc &&
+		(scsi_bidi_cmnd(sc) ||
+		 sc->sc_data_direction == DMA_FROM_DEVICE) &&
+			cxgb4i_is_ddp_tag(tformat, tag))
+		cxgb4i_ddp_tag_release(snic, tag);
+}
+
+void cxgb4i_cleanup_task(struct iscsi_task *task)
+{
+	struct cxgb4i_task_data *tdata = task->dd_data +
+				sizeof(struct iscsi_tcp_task);
+
+	/*  never reached the xmit task callout */
+	if (tdata->skb)
+		__kfree_skb(tdata->skb);
+	memset(tdata, 0, sizeof(*tdata));
+
+	cxgb4i_release_itt(task, task->hdr_itt);
+	iscsi_tcp_cleanup_task(task);
+}
+
+static struct scsi_host_template cxgb4i_host_template = {
+	.module				= THIS_MODULE,
+	.name				= "Chelsio T4 iSCSI initiator",
+	.proc_name			= "cxgb4i",
+	.queuecommand			= iscsi_queuecommand,
+	.change_queue_depth		= iscsi_change_queue_depth,
+	.can_queue			= CXGB4I_SCSI_HOST_QDEPTH,
+	.sg_tablesize			= SG_ALL,
+	.max_sectors			= 0xFFFF,
+	.cmd_per_lun			= ISCSI_DEF_CMD_PER_LUN,
+	.eh_abort_handler		= iscsi_eh_abort,
+	.eh_device_reset_handler	= iscsi_eh_device_reset,
+	.eh_target_reset_handler	= iscsi_eh_recover_target,
+	.target_alloc			= iscsi_target_alloc,
+	.use_clustering			= DISABLE_CLUSTERING,
+	.this_id			= -1,
+};
+
+#define	CXGB4I_CAPS	(CAP_RECOVERY_L0 | CAP_MULTI_R2T |	\
+			CAP_HDRDGST | CAP_DATADGST |		\
+			CAP_DIGEST_OFFLOAD | CAP_PADDING_OFFLOAD)
+#define	CXGB4I_PMASK	(ISCSI_MAX_RECV_DLENGTH | ISCSI_MAX_XMIT_DLENGTH | \
+			ISCSI_HDRDGST_EN | ISCSI_DATADGST_EN | \
+			ISCSI_INITIAL_R2T_EN | ISCSI_MAX_R2T | \
+			ISCSI_IMM_DATA_EN | ISCSI_FIRST_BURST | \
+			ISCSI_MAX_BURST | ISCSI_PDU_INORDER_EN | \
+			ISCSI_DATASEQ_INORDER_EN | ISCSI_ERL | \
+			ISCSI_CONN_PORT | ISCSI_CONN_ADDRESS | \
+			ISCSI_EXP_STATSN | ISCSI_PERSISTENT_PORT | \
+			ISCSI_PERSISTENT_ADDRESS | ISCSI_TARGET_NAME | \
+			ISCSI_TPGT | ISCSI_USERNAME | \
+			ISCSI_PASSWORD | ISCSI_USERNAME_IN | \
+			ISCSI_PASSWORD_IN | ISCSI_FAST_ABORT | \
+			ISCSI_ABORT_TMO | ISCSI_LU_RESET_TMO | \
+			/*ISCSI_TGT_RESET_TMO |*/ ISCSI_PING_TMO | \
+			ISCSI_RECV_TMO | ISCSI_IFACE_NAME | \
+			ISCSI_INITIATOR_NAME)
+#define	CXGB4I_HPMASK	(ISCSI_HOST_HWADDRESS | ISCSI_HOST_IPADDRESS | \
+			ISCSI_HOST_INITIATOR_NAME | ISCSI_HOST_INITIATOR_NAME)
+
+static struct iscsi_transport cxgb4i_iscsi_transport = {
+	.owner				= THIS_MODULE,
+	.name				= "cxgb4i",
+	.caps				= CXGB4I_CAPS,
+	.param_mask			= CXGB4I_PMASK,
+	.host_param_mask		= CXGB4I_HPMASK,
+	.get_host_param			= cxgb4i_get_host_param,
+	.set_host_param			= cxgb4i_set_host_param,
+
+	.create_session			= cxgb4i_create_session,
+	.destroy_session		= cxgb4i_destroy_session,
+	.get_session_param		= iscsi_session_get_param,
+
+	.create_conn			= cxgb4i_create_conn,
+	.bind_conn			= cxgb4i_bind_conn,
+	.destroy_conn			= iscsi_tcp_conn_teardown,
+	.start_conn			= iscsi_conn_start,
+	.stop_conn			= iscsi_conn_stop,
+	.get_conn_param			= cxgb4i_get_conn_param,
+	.set_param			= cxgb4i_set_conn_param,
+	.get_stats			= cxgb4i_get_conn_stats,
+
+	.send_pdu			= iscsi_conn_send_pdu,
+
+	.init_task			= iscsi_tcp_task_init,
+	.xmit_task			= iscsi_tcp_task_xmit,
+	.cleanup_task			= cxgb4i_cleanup_task,
+
+	.alloc_pdu			= cxgb4i_conn_alloc_pdu,
+	.init_pdu			= cxgb4i_conn_init_pdu,
+	.xmit_pdu			= cxgb4i_conn_xmit_pdu,
+	.parse_pdu_itt			= cxgb4i_parse_pdu_itt,
+
+	.ep_connect			= cxgb4i_ep_connect,
+	.ep_poll			= cxgb4i_ep_poll,
+	.ep_disconnect			= cxgb4i_ep_disconnect,
+
+	.session_recovery_timedout	= iscsi_session_recovery_timedout,
+};
+
+int cxgb4i_iscsi_init(void)
+{
+	cxgb4i_scsi_transport = iscsi_register_transport(
+					&cxgb4i_iscsi_transport);
+	if (!cxgb4i_scsi_transport) {
+		cxgb4i_log_error("Could not register cxgb4i transport\n");
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+void cxgb4i_iscsi_cleanup(void)
+{
+	if (cxgb4i_scsi_transport) {
+		cxgb4i_api_debug("cxgb4i transport 0x%p removed\n",
+				cxgb4i_scsi_transport);
+		iscsi_unregister_transport(&cxgb4i_iscsi_transport);
+	}
+}
+
diff --git a/drivers/scsi/cxgb4i/libcxgbi.c b/drivers/scsi/cxgb4i/libcxgbi.c
new file mode 100644
index 0000000..df1ce1d
--- /dev/null
+++ b/drivers/scsi/cxgb4i/libcxgbi.c
@@ -0,0 +1,464 @@
+/*
+ * libcxgbi.c: Chelsio common library for T3/T4 iSCSI ULD.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#include <linux/skbuff.h>
+#include <linux/crypto.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_host.h>
+
+#include "cxgb4i.h"
+#include "libcxgbi.h"
+
+/* always allocate rooms for AHS */
+#define SKB_TX_PDU_HEADER_LEN	\
+	(sizeof(struct iscsi_hdr) + ISCSI_MAX_AHS_SIZE)
+
+static unsigned int skb_extra_headroom;
+static struct page *pad_page;
+
+/*
+ * pdu receive, interact with libiscsi_tcp
+ */
+static inline int read_pdu_skb(struct iscsi_conn *conn, struct sk_buff *skb,
+			       unsigned int offset, int offloaded)
+{
+	int status = 0;
+	int bytes_read;
+
+	bytes_read = iscsi_tcp_recv_skb(conn, skb, offset, offloaded, &status);
+	switch (status) {
+	case ISCSI_TCP_CONN_ERR:
+		return -EIO;
+	case ISCSI_TCP_SUSPENDED:
+		/* no transfer - just have caller flush queue */
+		return bytes_read;
+	case ISCSI_TCP_SKB_DONE:
+		/*
+		 * pdus should always fit in the skb and we should get
+		 * segment done notifcation.
+		 */
+		iscsi_conn_printk(KERN_ERR, conn, "Invalid pdu or skb.");
+		return -EFAULT;
+	case ISCSI_TCP_SEGMENT_DONE:
+		return bytes_read;
+	default:
+		iscsi_conn_printk(KERN_ERR, conn, "Invalid iscsi_tcp_recv_skb "
+				  "status %d\n", status);
+		return -EINVAL;
+	}
+}
+
+static int cxgb4i_conn_read_bhs_pdu_skb(struct iscsi_conn *conn,
+				    struct sk_buff *skb)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	int rc;
+
+	cxgb4i_rx_debug("conn 0x%p, skb 0x%p, len %u, flag 0x%x.\n",
+			conn, skb, skb->len, cxgb4i_skb_ulp_mode(skb));
+
+	if (!iscsi_tcp_recv_segment_is_hdr(tcp_conn)) {
+		iscsi_conn_failure(conn, ISCSI_ERR_PROTO);
+		return -EIO;
+	}
+
+	if (conn->hdrdgst_en && (cxgb4i_skb_ulp_mode(skb) &
+				ULP2_FLAG_HCRC_ERROR)) {
+		iscsi_conn_failure(conn, ISCSI_ERR_HDR_DGST);
+		return -EIO;
+	}
+
+	rc = read_pdu_skb(conn, skb, 0, 0);
+	if (rc <= 0)
+		return rc;
+
+	return 0;
+}
+
+static int cxgb4i_conn_read_data_pdu_skb(struct iscsi_conn *conn,
+				    struct sk_buff *skb)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	bool offloaded = 0;
+	unsigned int offset = 0;
+	int rc;
+
+	cxgb4i_rx_debug("conn 0x%p, skb 0x%p, len %u, flag 0x%x.\n",
+			conn, skb, skb->len, cxgb4i_skb_ulp_mode(skb));
+
+	if (conn->datadgst_en &&
+			(cxgb4i_skb_ulp_mode(skb) & ULP2_FLAG_DCRC_ERROR)) {
+		iscsi_conn_failure(conn, ISCSI_ERR_DATA_DGST);
+		return -EIO;
+	}
+
+	if (iscsi_tcp_recv_segment_is_hdr(tcp_conn))
+		return 0;
+
+	if (conn->hdrdgst_en)
+		offset = ISCSI_DIGEST_SIZE;
+
+	if (cxgb4i_skb_ulp_mode(skb) & ULP2_FLAG_DATA_DDPED) {
+		cxgb4i_rx_debug("skb 0x%p, opcode 0x%x, data %u, ddp'ed, "
+				"itt 0x%x.\n",
+				skb,
+				tcp_conn->in.hdr->opcode & ISCSI_OPCODE_MASK,
+				tcp_conn->in.datalen,
+				ntohl(tcp_conn->in.hdr->itt));
+		offloaded = 1;
+	} else {
+		cxgb4i_rx_debug("skb 0x%p, opcode 0x%x, data %u, NOT ddp'ed, "
+				"itt 0x%x.\n",
+				skb,
+				tcp_conn->in.hdr->opcode & ISCSI_OPCODE_MASK,
+				tcp_conn->in.datalen,
+				ntohl(tcp_conn->in.hdr->itt));
+	}
+
+	rc = read_pdu_skb(conn, skb, 0, offloaded);
+	if (rc < 0)
+		return rc;
+	else
+		return 0;
+}
+
+static inline void tx_skb_setmode(struct sk_buff *skb, int hcrc, int dcrc)
+{
+	u8 submode = 0;
+
+	if (hcrc)
+		submode |= 1;
+	if (dcrc)
+		submode |= 2;
+	cxgb4i_skb_ulp_mode(skb) = (ULP_MODE_ISCSI << 4) | submode;
+}
+
+static int sgl_seek_offset(struct scatterlist *sgl, unsigned int sgcnt,
+				unsigned int offset, unsigned int *off,
+				struct scatterlist **sgp)
+{
+	int i;
+	struct scatterlist *sg;
+
+	for_each_sg(sgl, sg, sgcnt, i) {
+		if (offset < sg->length) {
+			*off = offset;
+			*sgp = sg;
+			return 0;
+		}
+		offset -= sg->length;
+	}
+	return -EFAULT;
+}
+
+static int sgl_read_to_frags(struct scatterlist *sg, unsigned int sgoffset,
+				unsigned int dlen, skb_frag_t *frags,
+				int frag_max)
+{
+	unsigned int datalen = dlen;
+	unsigned int sglen = sg->length - sgoffset;
+	struct page *page = sg_page(sg);
+	int i;
+
+	i = 0;
+	do {
+		unsigned int copy;
+
+		if (!sglen) {
+			sg = sg_next(sg);
+			if (!sg) {
+				cxgb4i_log_error("sg NULL, len %u/%u.\n",
+								datalen, dlen);
+				return -EINVAL;
+			}
+			sgoffset = 0;
+			sglen = sg->length;
+			page = sg_page(sg);
+
+		}
+		copy = min(datalen, sglen);
+		if (i && page == frags[i - 1].page &&
+		    sgoffset + sg->offset ==
+			frags[i - 1].page_offset + frags[i - 1].size) {
+			frags[i - 1].size += copy;
+		} else {
+			if (i >= frag_max) {
+				cxgb4i_log_error("too many pages %u, "
+						 "dlen %u.\n", frag_max, dlen);
+				return -EINVAL;
+			}
+
+			frags[i].page = page;
+			frags[i].page_offset = sg->offset + sgoffset;
+			frags[i].size = copy;
+			i++;
+		}
+		datalen -= copy;
+		sgoffset += copy;
+		sglen -= copy;
+	} while (datalen);
+
+	return i;
+}
+
+int cxgb4i_conn_alloc_pdu(struct iscsi_task *task, u8 opcode)
+{
+	struct iscsi_conn *conn = task->conn;
+	struct iscsi_tcp_task *tcp_task = task->dd_data;
+	struct cxgb4i_task_data *tdata = task->dd_data + sizeof(*tcp_task);
+	struct scsi_cmnd *sc = task->sc;
+	int headroom = SKB_TX_PDU_HEADER_LEN;
+
+	tcp_task->dd_data = tdata;
+	task->hdr = NULL;
+
+	/* write command, need to send data pdus */
+	if (skb_extra_headroom && (opcode == ISCSI_OP_SCSI_DATA_OUT ||
+	    (opcode == ISCSI_OP_SCSI_CMD &&
+	    (scsi_bidi_cmnd(sc) || sc->sc_data_direction == DMA_TO_DEVICE))))
+		headroom += min(skb_extra_headroom, conn->max_xmit_dlength);
+
+	tdata->skb = alloc_skb(TX_HEADER_LEN + headroom, GFP_ATOMIC);
+	if (!tdata->skb)
+		return -ENOMEM;
+	skb_reserve(tdata->skb, TX_HEADER_LEN);
+
+	cxgb4i_tx_debug("task 0x%p, opcode 0x%x, skb 0x%p.\n",
+			task, opcode, tdata->skb);
+
+	task->hdr = (struct iscsi_hdr *)tdata->skb->data;
+	task->hdr_max = SKB_TX_PDU_HEADER_LEN;
+
+	/* data_out uses scsi_cmd's itt */
+	if (opcode != ISCSI_OP_SCSI_DATA_OUT)
+		cxgb4i_reserve_itt(task, &task->hdr->itt);
+
+	return 0;
+}
+
+int cxgb4i_conn_init_pdu(struct iscsi_task *task, unsigned int offset,
+			      unsigned int count)
+{
+	struct iscsi_conn *conn = task->conn;
+	struct iscsi_tcp_task *tcp_task = task->dd_data;
+	struct cxgb4i_task_data *tdata = tcp_task->dd_data;
+	struct sk_buff *skb = tdata->skb;
+	unsigned int datalen = count;
+	int i, padlen = iscsi_padding(count);
+	struct page *pg;
+
+	cxgb4i_tx_debug("task 0x%p,0x%p, offset %u, count %u, skb 0x%p.\n",
+			task, task->sc, offset, count, skb);
+
+	skb_put(skb, task->hdr_len);
+	tx_skb_setmode(skb, conn->hdrdgst_en, datalen ? conn->datadgst_en : 0);
+	if (!count)
+		return 0;
+
+	if (task->sc) {
+		struct scsi_data_buffer *sdb = scsi_out(task->sc);
+		struct scatterlist *sg = NULL;
+		int err;
+
+		tdata->offset = offset;
+		tdata->count = count;
+		err = sgl_seek_offset(sdb->table.sgl, sdb->table.nents,
+					tdata->offset, &tdata->sgoffset, &sg);
+		if (err < 0) {
+			cxgb4i_log_warn("tpdu, sgl %u, bad offset %u/%u.\n",
+					sdb->table.nents, tdata->offset,
+					sdb->length);
+			return err;
+		}
+		err = sgl_read_to_frags(sg, tdata->sgoffset, tdata->count,
+					tdata->frags, MAX_PDU_FRAGS);
+		if (err < 0) {
+			cxgb4i_log_warn("tpdu, sgl %u, bad offset %u + %u.\n",
+					sdb->table.nents, tdata->offset,
+					tdata->count);
+			return err;
+		}
+		tdata->nr_frags = err;
+
+		if (tdata->nr_frags > MAX_SKB_FRAGS ||
+		    (padlen && tdata->nr_frags == MAX_SKB_FRAGS)) {
+			char *dst = skb->data + task->hdr_len;
+			skb_frag_t *frag = tdata->frags;
+
+			/* data fits in the skb's headroom */
+			for (i = 0; i < tdata->nr_frags; i++, frag++) {
+				char *src = kmap_atomic(frag->page,
+							KM_SOFTIRQ0);
+
+				memcpy(dst, src+frag->page_offset, frag->size);
+				dst += frag->size;
+				kunmap_atomic(src, KM_SOFTIRQ0);
+			}
+			if (padlen) {
+				memset(dst, 0, padlen);
+				padlen = 0;
+			}
+			skb_put(skb, count + padlen);
+		} else {
+			/* data fit into frag_list */
+			for (i = 0; i < tdata->nr_frags; i++)
+				get_page(tdata->frags[i].page);
+
+			memcpy(skb_shinfo(skb)->frags, tdata->frags,
+				sizeof(skb_frag_t) * tdata->nr_frags);
+			skb_shinfo(skb)->nr_frags = tdata->nr_frags;
+			skb->len += count;
+			skb->data_len += count;
+			skb->truesize += count;
+		}
+
+	} else {
+		pg = virt_to_page(task->data);
+
+		get_page(pg);
+		skb_fill_page_desc(skb, 0, pg, offset_in_page(task->data),
+					count);
+		skb->len += count;
+		skb->data_len += count;
+		skb->truesize += count;
+	}
+
+	if (padlen) {
+		i = skb_shinfo(skb)->nr_frags;
+		get_page(pad_page);
+		skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags, pad_page, 0,
+				 padlen);
+
+		skb->data_len += padlen;
+		skb->truesize += padlen;
+		skb->len += padlen;
+	}
+
+	return 0;
+}
+
+int cxgb4i_conn_xmit_pdu(struct iscsi_task *task)
+{
+	struct iscsi_tcp_conn *tcp_conn = task->conn->dd_data;
+	struct cxgb4i_conn *cconn = tcp_conn->dd_data;
+	struct iscsi_tcp_task *tcp_task = task->dd_data;
+	struct cxgb4i_task_data *tdata = tcp_task->dd_data;
+	struct sk_buff *skb = tdata->skb;
+	unsigned int datalen;
+	int err;
+
+	if (!skb)
+		return 0;
+
+	datalen = skb->data_len;
+	tdata->skb = NULL;
+	err = cxgb4i_sock_send_pdus(cconn->cep->csk, skb);
+	if (err > 0) {
+		int pdulen = err;
+
+		cxgb4i_tx_debug("task 0x%p, skb 0x%p, len %u/%u, rv %d.\n",
+				task, skb, skb->len, skb->data_len, err);
+
+		if (task->conn->hdrdgst_en)
+			pdulen += ISCSI_DIGEST_SIZE;
+		if (datalen && task->conn->datadgst_en)
+			pdulen += ISCSI_DIGEST_SIZE;
+
+		task->conn->txdata_octets += pdulen;
+		return 0;
+	}
+
+	if (err == -EAGAIN || err == -ENOBUFS) {
+		/* reset skb to send when we are called again */
+		tdata->skb = skb;
+		return err;
+	}
+
+	kfree_skb(skb);
+	cxgb4i_tx_debug("itt 0x%x, skb 0x%p, len %u/%u, xmit err %d.\n",
+			task->itt, skb, skb->len, skb->data_len, err);
+	iscsi_conn_printk(KERN_ERR, task->conn, "xmit err %d.\n", err);
+	iscsi_conn_failure(task->conn, ISCSI_ERR_XMIT_FAILED);
+	return err;
+}
+
+int cxgb4i_pdu_init(void)
+{
+	if (SKB_TX_HEADROOM > (512 * MAX_SKB_FRAGS))
+		skb_extra_headroom = SKB_TX_HEADROOM;
+	pad_page = alloc_page(GFP_KERNEL);
+	if (!pad_page)
+		return -ENOMEM;
+	memset(page_address(pad_page), 0, PAGE_SIZE);
+	return 0;
+}
+
+void cxgb4i_pdu_cleanup(void)
+{
+	if (pad_page) {
+		__free_page(pad_page);
+		pad_page = NULL;
+	}
+}
+
+void cxgbi_conn_pdu_ready(struct cxgb4i_sock *csk)
+{
+	struct sk_buff *skb;
+	unsigned int read = 0;
+	struct iscsi_conn *conn = csk->user_data;
+	int err = 0;
+
+	cxgb4i_rx_debug("csk 0x%p.\n", csk);
+
+	read_lock(&csk->callback_lock);
+	if (unlikely(!conn || conn->suspend_rx)) {
+		cxgb4i_rx_debug("conn 0x%p, id %d, suspend_rx %lu!\n",
+				conn, conn ? conn->id : 0xFF,
+				conn ? conn->suspend_rx : 0xFF);
+		read_unlock(&csk->callback_lock);
+		return;
+	}
+	skb = skb_peek(&csk->receive_queue);
+	while (!err && skb) {
+		__skb_unlink(skb, &csk->receive_queue);
+		read += cxgb4i_skb_rx_pdulen(skb);
+		cxgb4i_rx_debug("conn 0x%p, csk 0x%p, rx skb 0x%p, pdulen %u\n",
+				conn, csk, skb, cxgb4i_skb_rx_pdulen(skb));
+		if (cxgb4i_skb_flags(skb) & CXGB4I_SKCB_FLAG_HDR_RCVD)
+			err = cxgb4i_conn_read_bhs_pdu_skb(conn, skb);
+		else if (cxgb4i_skb_flags(skb) == CXGB4I_SKCB_FLAG_DATA_RCVD)
+			err = cxgb4i_conn_read_data_pdu_skb(conn, skb);
+		__kfree_skb(skb);
+		skb = skb_peek(&csk->receive_queue);
+	}
+	read_unlock(&csk->callback_lock);
+	csk->copied_seq += read;
+	cxgb4i_sock_rx_credits(csk, read);
+	conn->rxdata_octets += read;
+
+	if (err) {
+		cxgb4i_log_info("conn 0x%p rx failed err %d.\n", conn, err);
+		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+	}
+}
+
+void cxgbi_conn_tx_open(struct cxgb4i_sock *csk)
+{
+	struct iscsi_conn *conn = csk->user_data;
+
+	if (conn) {
+		cxgb4i_tx_debug("cn 0x%p, cid %d.\n", csk, conn->id);
+		iscsi_conn_queue_work(conn);
+	}
+}
+
diff --git a/drivers/scsi/cxgb4i/libcxgbi.h b/drivers/scsi/cxgb4i/libcxgbi.h
new file mode 100644
index 0000000..e6c12c6
--- /dev/null
+++ b/drivers/scsi/cxgb4i/libcxgbi.h
@@ -0,0 +1,61 @@
+/*
+ * libcxgbi.h: Chelsio common library for T3/T4 iSCSI ULD.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#ifndef	__CXGB4I_PDU_H__
+#define	__CXGB4I_PDU_H__
+
+struct cpl_rx_data_ddp {
+	union opcode_tid ot;
+	__be16 urg;
+	__be16 len;
+	__be32 seq;
+	union {
+		__be32 nxt_seq;
+		__be32 ddp_report;
+	};
+	__be32 ulp_crc;
+	__be32 ddpvld;
+};
+
+#define RX_DDP_STATUS_IPP_SHIFT		27      /* invalid pagepod */
+#define RX_DDP_STATUS_TID_SHIFT		26      /* tid mismatch */
+#define RX_DDP_STATUS_COLOR_SHIFT	25      /* color mismatch */
+#define RX_DDP_STATUS_OFFSET_SHIFT	24      /* offset mismatch */
+#define RX_DDP_STATUS_ULIMIT_SHIFT	23      /* ulimit error */
+#define RX_DDP_STATUS_TAG_SHIFT		22      /* tag mismatch */
+#define RX_DDP_STATUS_DCRC_SHIFT	21      /* dcrc error */
+#define RX_DDP_STATUS_HCRC_SHIFT	20      /* hcrc error */
+#define RX_DDP_STATUS_PAD_SHIFT		19      /* pad error */
+#define RX_DDP_STATUS_PPP_SHIFT		18      /* pagepod parity error */
+#define RX_DDP_STATUS_LLIMIT_SHIFT	17      /* llimit error */
+#define RX_DDP_STATUS_DDP_SHIFT		16      /* ddp'able */
+#define RX_DDP_STATUS_PMM_SHIFT		15      /* pagepod mismatch */
+
+
+#define ULP2_FLAG_DATA_READY		0x1
+#define ULP2_FLAG_DATA_DDPED		0x2
+#define ULP2_FLAG_HCRC_ERROR		0x4
+#define ULP2_FLAG_DCRC_ERROR		0x8
+#define ULP2_FLAG_PAD_ERROR		0x10
+
+void cxgbi_conn_closing(struct cxgb4i_sock *);
+void cxgbi_conn_pdu_ready(struct cxgb4i_sock *);
+void cxgbi_conn_tx_open(struct cxgb4i_sock *);
+
+int cxgbi_init_pdu(struct iscsi_task *, unsigned int, unsigned int);
+int cxgbi_alloc_pdu(struct iscsi_task *, u8);
+int cxgbi_xmit_pdu(struct iscsi_task *);
+
+
+#endif	/*__CXGB4I_PDU_H__*/
+
-- 
1.6.6.1

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply related

* [PATCH 2/3] cxgb4i: main driver files
From: Rakesh Ranjan @ 2010-05-04 16:43 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, kxie-ut6Up61K2wZBDgjK7y7TUQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk,
	michaelc-hcNo3dDEHLuVc3sceRu5cw, Rakesh Ranjan
In-Reply-To: <1272991388-8527-2-git-send-email-rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

From: Rakesh Ranjan <rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>


Signed-off-by: Rakesh Ranjan <rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
 drivers/scsi/cxgb4i/cxgb4i.h         |  201 ++++
 drivers/scsi/cxgb4i/cxgb4i_ddp.c     |  696 +++++++++++++
 drivers/scsi/cxgb4i/cxgb4i_ddp.h     |  238 +++++
 drivers/scsi/cxgb4i/cxgb4i_offload.c | 1853 ++++++++++++++++++++++++++++++++++
 drivers/scsi/cxgb4i/cxgb4i_offload.h |  171 ++++
 drivers/scsi/cxgb4i/cxgb4i_snic.c    |  253 +++++
 6 files changed, 3412 insertions(+), 0 deletions(-)
 create mode 100644 drivers/scsi/cxgb4i/cxgb4i.h
 create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.c
 create mode 100644 drivers/scsi/cxgb4i/cxgb4i_ddp.h
 create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.c
 create mode 100644 drivers/scsi/cxgb4i/cxgb4i_offload.h
 create mode 100644 drivers/scsi/cxgb4i/cxgb4i_snic.c

diff --git a/drivers/scsi/cxgb4i/cxgb4i.h b/drivers/scsi/cxgb4i/cxgb4i.h
new file mode 100644
index 0000000..8007284
--- /dev/null
+++ b/drivers/scsi/cxgb4i/cxgb4i.h
@@ -0,0 +1,201 @@
+/*
+ * cxgb4i.h: Chelsio T4 iSCSI driver.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#ifndef	__CXGB4I_H__
+#define	__CXGB4I_H__
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <linux/if_vlan.h>
+#include <linux/scatterlist.h>
+#include <linux/skbuff.h>
+#include <scsi/libiscsi_tcp.h>
+
+#include "t4fw_api.h"
+#include "t4_msg.h"
+#include "l2t.h"
+#include "cxgb4.h"
+#include "cxgb4_uld.h"
+
+#include "cxgb4i_ddp.h"
+#include "cxgb4i_offload.h"
+
+#define	cxgb4i_log_error(fmt...)	printk(KERN_ERR "cxgb4i: ERR! " fmt)
+#define cxgb4i_log_warn(fmt...)	printk(KERN_WARNING "cxgb4i: WARN! " fmt)
+#define cxgb4i_log_info(fmt...)	printk(KERN_INFO "cxgb4i: " fmt)
+#define cxgb4i_debug_log(fmt, args...) \
+	printk(KERN_INFO "cxgb4i: %s - " fmt, __func__ , ## args)
+
+
+#ifdef	__DEBUG_CXGB4I__
+#define	cxgb4i_log_debug	cxgb4i_debug_log
+#else
+#define cxgb4i_log_debug(fmt...)
+#endif
+
+#ifdef __DEBUG_CXGB4I_TAG__
+#define cxgb4i_tag_debug        cxgb4i_log_debug
+#else
+#define cxgb4i_tag_debug(fmt...)
+#endif
+
+#ifdef __DEBUG_CXGB4I_API__
+#define cxgb4i_api_debug        cxgb4i_log_debug
+#else
+#define cxgb4i_api_debug(fmt...)
+#endif
+
+#ifdef __DEBUG_CXGB4I_CONN__
+#define cxgb4i_conn_debug         cxgb4i_log_debug
+#else
+#define cxgb4i_conn_debug(fmt...)
+#endif
+
+#ifdef __DEBUG_CXGB4I_TX__
+#define cxgb4i_tx_debug           cxgb4i_log_debug
+#else
+#define cxgb4i_tx_debug(fmt...)
+#endif
+
+#ifdef __DEBUG_CXGB4I_RX__
+#define cxgb4i_rx_debug           cxgb4i_log_debug
+#else
+#define cxgb4i_rx_debug(fmt...)
+#endif
+
+#define	CXGB4I_SCSI_HOST_QDEPTH	1024
+#define	CXGB4I_MAX_TARGET	CXGB4I_MAX_CONN
+#define	CXGB4I_MAX_LUN		512
+#define	ISCSI_PDU_NONPAYLOAD_MAX \
+	(sizeof(struct iscsi_hdr) + ISCSI_MAX_AHS_SIZE + \
+	 (2 * ISCSI_DIGEST_SIZE))
+
+struct cxgb4i_snic;
+struct cxgb4i_host;
+struct cxgb4i_endpoint;
+typedef int (*cxgb4i_cplhandler_func)(struct cxgb4i_snic *, struct sk_buff *);
+
+
+struct cxgb4i_hba {
+	struct cxgb4i_snic *snic;
+	struct net_device *ndev;
+	struct Scsi_Host *shost;
+	struct port_info *pinfo;
+	__be32 ipv4addr;
+};
+
+struct cxgb4i_ports_map {
+	spinlock_t lock;
+	unsigned int next;
+	struct cxgb4i_sock *port_csk[0];
+};
+
+struct cxgb4i_snic {
+	struct list_head list_head;
+	spinlock_t lock;
+
+	struct cxgb4_lld_info lldi;
+
+	struct cxgb4i_hba *hba[MAX_NPORTS];
+	unsigned char hba_cnt;
+
+	unsigned int flags;
+	unsigned int tx_max_size;
+	unsigned int rx_max_size;
+
+	struct cxgb4i_tag_format tag_format;
+	struct cxgb4i_ddp_info *ddp;
+
+	struct cxgb4i_ports_map *pmap;
+	cxgb4i_cplhandler_func *funcs;
+};
+
+struct cxgb4i_conn {
+	struct list_head list_head;
+	struct cxgb4i_endpoint *cep;
+	struct iscsi_conn *iconn;
+	struct cxgb4i_hba *chba;
+	u32 task_idx_bits;
+};
+
+struct cxgb4i_endpoint {
+	struct cxgb4i_conn *cconn;
+	struct cxgb4i_hba *chba;
+	struct cxgb4i_sock *csk;
+};
+
+#define MAX_PDU_FRAGS	((ULP2_MAX_PDU_PAYLOAD + 512 - 1) / 512)
+struct cxgb4i_task_data {
+	unsigned short nr_frags;
+	skb_frag_t frags[MAX_PDU_FRAGS];
+	struct sk_buff *skb;
+	unsigned int offset;
+	unsigned int count;
+	unsigned int sgoffset;
+};
+
+int cxgb4i_ofld_init(struct cxgb4i_snic *);
+void cxgb4i_ofld_cleanup(struct cxgb4i_snic *);
+struct cxgb4i_snic *cxgb4i_snic_init(const struct cxgb4_lld_info *);
+void cxgb4i_snic_cleanup(void);
+struct cxgb4i_snic *cxgb4i_find_snic(struct net_device *, __be32);
+struct cxgb4i_hba *cxgb4i_hba_find_by_netdev(struct net_device *);
+struct cxgb4i_hba *cxgb4i_hba_add(struct cxgb4i_snic *, struct net_device *);
+void cxgb4i_hba_remove(struct cxgb4i_hba *);
+int cxgb4i_iscsi_init(void);
+void cxgb4i_iscsi_cleanup(void);
+
+int cxgb4i_pdu_init(void);
+void cxgb4i_pdu_cleanup(void);
+int cxgb4i_conn_alloc_pdu(struct iscsi_task *, u8);
+int cxgb4i_conn_init_pdu(struct iscsi_task *, unsigned int, unsigned int);
+int cxgb4i_conn_xmit_pdu(struct iscsi_task *);
+int cxgb4i_xmit_pdu_dummy(struct iscsi_task *task);
+
+int cxgb4i_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt);
+void cxgb4i_release_itt(struct iscsi_task *task, itt_t hdr_itt);
+
+int cxgb4i_sport_init(struct cxgb4i_snic *);
+
+static inline void *cplhdr(struct sk_buff *skb)
+{
+	return skb->data;
+}
+
+static inline void cxgb4i_set_iscsi_ipv4(struct cxgb4i_hba *chba, __be32 ipaddr)
+{
+	chba->ipv4addr = ipaddr;
+}
+
+static inline __be32 cxgb4i_get_iscsi_ipv4(struct cxgb4i_hba *chba)
+{
+	return chba->ipv4addr;
+}
+
+#define W_TCB_ULP_TYPE          0
+#define TCB_ULP_TYPE_SHIFT      0
+#define TCB_ULP_TYPE_MASK       0xfULL
+#define TCB_ULP_TYPE(x)         ((x) << TCB_ULP_TYPE_SHIFT)
+
+#define W_TCB_ULP_RAW           0
+#define TCB_ULP_RAW_SHIFT       4
+#define TCB_ULP_RAW_MASK        0xffULL
+#define TCB_ULP_RAW(x)          ((x) << TCB_ULP_RAW_SHIFT)
+
+
+#endif	/* __CXGB4I_H__ */
+
diff --git a/drivers/scsi/cxgb4i/cxgb4i_ddp.c b/drivers/scsi/cxgb4i/cxgb4i_ddp.c
new file mode 100644
index 0000000..8387822
--- /dev/null
+++ b/drivers/scsi/cxgb4i/cxgb4i_ddp.c
@@ -0,0 +1,696 @@
+/*
+ * cxgb4i_ddp.c: Chelsio T4 iSCSI driver.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#include <linux/skbuff.h>
+#include <linux/scatterlist.h>
+
+#include "cxgb4i.h"
+#include "cxgb4i_ddp.h"
+
+#define DDP_PGIDX_MAX	4
+#define DDP_THRESHOLD	2048
+
+static unsigned char ddp_page_order[DDP_PGIDX_MAX] = {0, 1, 2, 4};
+static unsigned char ddp_page_shift[DDP_PGIDX_MAX] = {12, 13, 14, 16};
+static unsigned char page_idx = DDP_PGIDX_MAX;
+
+static unsigned char sw_tag_idx_bits;
+static unsigned char sw_tag_age_bits;
+
+
+static inline void cxgb4i_ddp_ppod_set(struct pagepod *ppod,
+					struct pagepod_hdr *hdr,
+					struct cxgb4i_gather_list *gl,
+					unsigned int pidx)
+{
+	int i;
+
+	memcpy(ppod, hdr, sizeof(*hdr));
+	for (i = 0; i < (PPOD_PAGES_MAX + 1); i++, pidx++) {
+		ppod->addr[i] = pidx < gl->nelem ?
+			cpu_to_be64(gl->phys_addr[pidx]) : 0ULL;
+	}
+}
+
+static inline void cxgb4i_ddp_ppod_clear(struct pagepod *ppod)
+{
+	memset(ppod, 0, sizeof(*ppod));
+}
+
+static inline void cxgb4i_ddp_ulp_mem_io_set_hdr(struct ulp_mem_io *req,
+					unsigned int wr_len, unsigned int dlen,
+					unsigned int pm_addr)
+{
+	struct ulptx_sgl *sgl;
+
+	INIT_ULPTX_WR(req, wr_len, 0, 0);
+	req->cmd = htonl(ULPTX_CMD(ULP_TX_MEM_WRITE));
+	req->dlen = htonl(ULP_MEMIO_DATA_LEN(dlen >> 5));
+	req->len16 = htonl(DIV_ROUND_UP(wr_len - sizeof(req->wr), 16));
+	req->lock_addr = htonl(ULP_MEMIO_ADDR(pm_addr >> 5));
+
+	sgl = (struct ulptx_sgl *)(req + 1);
+	sgl->cmd_nsge = htonl(ULPTX_CMD(ULP_TX_SC_DSGL) | ULPTX_NSGE(1));
+	sgl->len0 = htonl(dlen);
+}
+
+static int cxgb4i_ddp_ppod_write_sgl(struct cxgb4i_ddp_info *ddp,
+					struct pagepod_hdr *hdr,
+					unsigned int idx,
+					unsigned int npods,
+					struct cxgb4i_gather_list *gl,
+					unsigned int gl_pidx)
+{
+	unsigned int dlen = PPOD_SIZE * npods;
+	unsigned int pm_addr = idx * PPOD_SIZE + ddp->llimit;
+	unsigned int wr_len = roundup(sizeof(struct ulp_mem_io) +
+					sizeof(struct ulptx_sgl), 16);
+	struct sk_buff *skb = alloc_skb(wr_len + dlen, GFP_ATOMIC);
+	struct ulp_mem_io *req;
+	struct ulptx_sgl *sgl;
+	struct pagepod *ppod;
+	unsigned int i;
+
+	if (!skb) {
+		cxgb4i_log_error("snic 0x%p, idx %u, npods %u, OOM\n",
+				ddp->snic, idx, npods);
+		return -ENOMEM;
+	}
+
+	memset(skb->data, 0, wr_len + dlen);
+	skb->queue_mapping = CPL_PRIORITY_CONTROL;
+
+	req = (struct ulp_mem_io *)__skb_put(skb, wr_len);
+	cxgb4i_ddp_ulp_mem_io_set_hdr(req, wr_len, dlen, pm_addr);
+	sgl = (struct ulptx_sgl *)(req + 1);
+	ppod = (struct pagepod *)(sgl + 1);
+	sgl->addr0 = cpu_to_be64(virt_to_phys(ppod));
+
+	for (i = 0; i < npods; i++, ppod++, gl_pidx += PPOD_PAGES_MAX) {
+		if (!hdr && !gl)
+			cxgb4i_ddp_ppod_clear(ppod);
+		else
+			cxgb4i_ddp_ppod_set(ppod, hdr, gl, gl_pidx);
+
+	}
+
+	cxgb4_ofld_send(ddp->snic->lldi.ports[0], skb);
+
+	return 0;
+}
+
+static int cxgb4i_ddp_set_map(struct cxgb4i_ddp_info *ddp,
+					struct pagepod_hdr *hdr,
+					unsigned int idx,
+					unsigned int npods,
+					struct cxgb4i_gather_list *gl)
+{
+	unsigned int pidx = 0;
+	unsigned int w_npods = 0;
+	unsigned int cnt;
+	int err = 0;
+
+	for (; w_npods < npods; idx += cnt, w_npods += cnt,
+					pidx += PPOD_PAGES_MAX) {
+		cnt = npods - w_npods;
+		if (cnt > ULPMEM_DSGL_MAX_NPPODS)
+			cnt = ULPMEM_DSGL_MAX_NPPODS;
+		err = cxgb4i_ddp_ppod_write_sgl(ddp, hdr, idx, cnt, gl, pidx);
+
+		if (err < 0)
+			break;
+	}
+
+	return err;
+}
+
+static void cxgb4i_ddp_clear_map(struct cxgb4i_ddp_info *ddp,
+						unsigned int tag,
+						unsigned int idx,
+						unsigned int npods)
+{
+	int err;
+	unsigned int w_npods = 0;
+	unsigned int cnt;
+
+	for (; w_npods < npods; idx += cnt, w_npods += cnt) {
+		cnt = npods - w_npods;
+
+		if (cnt > ULPMEM_DSGL_MAX_NPPODS)
+			cnt = ULPMEM_DSGL_MAX_NPPODS;
+		err = cxgb4i_ddp_ppod_write_sgl(ddp, NULL, idx, cnt, NULL, 0);
+
+		if (err < 0)
+			break;
+	}
+}
+
+static inline int cxgb4i_ddp_find_unused_entries(struct cxgb4i_ddp_info *ddp,
+					unsigned int start, unsigned int max,
+					unsigned int count,
+					struct cxgb4i_gather_list *gl)
+{
+	unsigned int i, j, k;
+
+	/*  not enough entries */
+	if ((max - start) < count)
+		return -EBUSY;
+
+	max -= count;
+	spin_lock(&ddp->map_lock);
+	for (i = start; i < max;) {
+		for (j = 0, k = i; j < count; j++, k++) {
+			if (ddp->gl_map[k])
+				break;
+		}
+		if (j == count) {
+			for (j = 0, k = i; j < count; j++, k++)
+				ddp->gl_map[k] = gl;
+			spin_unlock(&ddp->map_lock);
+			return i;
+		}
+		i += j + 1;
+	}
+	spin_unlock(&ddp->map_lock);
+	return -EBUSY;
+}
+
+static inline void cxgb4i_ddp_unmark_entries(struct cxgb4i_ddp_info *ddp,
+							int start, int count)
+{
+	spin_lock(&ddp->map_lock);
+	memset(&ddp->gl_map[start], 0,
+			count * sizeof(struct cxgb4i_gather_list *));
+	spin_unlock(&ddp->map_lock);
+}
+
+static int cxgb4i_ddp_find_page_index(unsigned long pgsz)
+{
+	int i;
+
+	for (i = 0; i < DDP_PGIDX_MAX; i++) {
+		if (pgsz == (1UL << ddp_page_shift[i]))
+			return i;
+	}
+	cxgb4i_log_debug("ddp page size 0x%lx not supported\n", pgsz);
+
+	return DDP_PGIDX_MAX;
+}
+
+static int cxgb4i_ddp_adjust_page_table(void)
+{
+	int i;
+	unsigned int base_order, order;
+
+	if (PAGE_SIZE < (1UL << ddp_page_shift[0])) {
+		cxgb4i_log_info("PAGE_SIZE 0x%lx too small, min 0x%lx\n",
+				PAGE_SIZE, 1UL << ddp_page_shift[0]);
+		return -EINVAL;
+	}
+
+	base_order = get_order(1UL << ddp_page_shift[0]);
+	order = get_order(1UL << PAGE_SHIFT);
+
+	for (i = 0; i < DDP_PGIDX_MAX; i++) {
+		/* first is the kernel page size, then just doubling the size */
+		ddp_page_order[i] = order - base_order + i;
+		ddp_page_shift[i] = PAGE_SHIFT + i;
+	}
+
+	return 0;
+}
+
+static inline void cxgb4i_ddp_gl_unmap(struct pci_dev *pdev,
+					struct cxgb4i_gather_list *gl)
+{
+	int i;
+
+	for (i = 0; i < gl->nelem; i++)
+		dma_unmap_page(&pdev->dev, gl->phys_addr[i], PAGE_SIZE,
+				PCI_DMA_FROMDEVICE);
+}
+
+static inline int cxgb4i_ddp_gl_map(struct pci_dev *pdev,
+				struct cxgb4i_gather_list *gl)
+{
+	int i;
+
+	for (i = 0; i < gl->nelem; i++) {
+		gl->phys_addr[i] = dma_map_page(&pdev->dev, gl->pages[i], 0,
+						PAGE_SIZE,
+						PCI_DMA_FROMDEVICE);
+		if (unlikely(dma_mapping_error(&pdev->dev, gl->phys_addr[i])))
+			goto unmap;
+	}
+
+	return i;
+
+unmap:
+	if (i) {
+		unsigned int nelem = gl->nelem;
+
+		gl->nelem = i;
+		cxgb4i_ddp_gl_unmap(pdev, gl);
+		gl->nelem = nelem;
+	}
+	return -ENOMEM;
+}
+
+struct cxgb4i_gather_list *cxgb4i_ddp_make_gl(unsigned int xferlen,
+						struct scatterlist *sgl,
+						unsigned int sgcnt,
+						struct pci_dev *pdev,
+						gfp_t gfp)
+{
+	struct cxgb4i_gather_list *gl;
+	struct scatterlist *sg = sgl;
+	struct page *sgpage = sg_page(sg);
+	unsigned int sglen = sg->length;
+	unsigned int sgoffset = sg->offset;
+	unsigned int npages = (xferlen + sgoffset + PAGE_SIZE - 1) >>
+				PAGE_SHIFT;
+	int i = 1, j = 0;
+
+	if (xferlen < DDP_THRESHOLD) {
+		cxgb4i_log_debug("xfer %u < threshold %u, no ddp.\n",
+				xferlen, DDP_THRESHOLD);
+		return NULL;
+	}
+
+	gl = kzalloc(sizeof(struct cxgb4i_gather_list) +
+			npages * (sizeof(dma_addr_t) + sizeof(struct page *)),
+			gfp);
+	if (!gl)
+		return NULL;
+
+	gl->pages = (struct page **)&gl->phys_addr[npages];
+	gl->length = xferlen;
+	gl->offset = sgoffset;
+	gl->pages[0] = sgpage;
+
+	sg = sg_next(sg);
+	while (sg) {
+		struct page *page = sg_page(sg);
+
+		if (sgpage == page && sg->offset == sgoffset + sglen)
+			sglen += sg->length;
+		else {
+			/*  make sure the sgl is fit for ddp:
+			 *  each has the same page size, and
+			 *  all of the middle pages are used completely
+			 */
+			if ((j && sgoffset) || ((i != sgcnt - 1) &&
+					 ((sglen + sgoffset) & ~PAGE_MASK)))
+				goto error_out;
+
+			j++;
+			if (j == gl->nelem || sg->offset)
+				goto error_out;
+			gl->pages[j] = page;
+			sglen = sg->length;
+			sgoffset = sg->offset;
+			sgpage = page;
+		}
+		i++;
+		sg = sg_next(sg);
+	}
+	gl->nelem = ++j;
+
+	if (cxgb4i_ddp_gl_map(pdev, gl) < 0)
+		goto error_out;
+
+	return gl;
+
+error_out:
+	kfree(gl);
+	return NULL;
+}
+
+void cxgb4i_ddp_release_gl(struct cxgb4i_gather_list *gl,
+				struct pci_dev *pdev)
+{
+	cxgb4i_ddp_gl_unmap(pdev, gl);
+	kfree(gl);
+}
+
+int cxgb4i_ddp_tag_reserve(struct cxgb4i_snic *snic, unsigned int tid,
+				struct cxgb4i_tag_format *tformat, u32 *tagp,
+				struct cxgb4i_gather_list *gl, gfp_t gfp)
+{
+	struct cxgb4i_ddp_info *ddp = snic->ddp;
+	struct pagepod_hdr hdr;
+	unsigned int npods;
+	int idx = -1;
+	int err = -ENOMEM;
+	u32 sw_tag = *tagp;
+	u32 tag;
+
+	if (page_idx >= DDP_PGIDX_MAX || !ddp || !gl || !gl->nelem ||
+			gl->length < DDP_THRESHOLD) {
+		cxgb4i_log_debug("pgidx %u, xfer %u/%u, NO ddp.\n",
+				page_idx, gl->length, DDP_THRESHOLD);
+		return -EINVAL;
+	}
+
+	npods = (gl->nelem + PPOD_PAGES_MAX - 1) >> PPOD_PAGES_SHIFT;
+
+	if (ddp->idx_last == ddp->nppods)
+		idx = cxgb4i_ddp_find_unused_entries(ddp, 0, ddp->nppods,
+							npods, gl);
+	else {
+		idx = cxgb4i_ddp_find_unused_entries(ddp, ddp->idx_last + 1,
+							ddp->nppods, npods,
+							gl);
+		if (idx < 0 && ddp->idx_last >= npods) {
+			idx = cxgb4i_ddp_find_unused_entries(ddp, 0,
+				min(ddp->idx_last + npods, ddp->nppods),
+							npods, gl);
+		}
+	}
+	if (idx < 0) {
+		cxgb4i_log_debug("xferlen %u, gl %u, npods %u NO DDP.\n",
+				gl->length, gl->nelem, npods);
+		return idx;
+	}
+
+	tag = cxgb4i_ddp_tag_base(tformat, sw_tag);
+	tag |= idx << PPOD_IDX_SHIFT;
+
+	hdr.rsvd = 0;
+	hdr.vld_tid = htonl(PPOD_VALID_FLAG | PPOD_TID(tid));
+	hdr.pgsz_tag_clr = htonl(tag & ddp->rsvd_tag_mask);
+	hdr.max_offset = htonl(gl->length);
+	hdr.page_offset = htonl(gl->offset);
+
+	err = cxgb4i_ddp_set_map(ddp, &hdr, idx, npods, gl);
+	if (err < 0)
+		goto unmark_entries;
+
+	ddp->idx_last = idx;
+	cxgb4i_log_debug("xfer %u, gl %u,%u, tid 0x%x, 0x%x -> 0x%x(%u,%u).\n",
+			gl->length, gl->nelem, gl->offset, tid, sw_tag, tag,
+			idx, npods);
+	*tagp = tag;
+	return 0;
+
+unmark_entries:
+	cxgb4i_ddp_unmark_entries(ddp, idx, npods);
+	return err;
+}
+
+void cxgb4i_ddp_tag_release(struct cxgb4i_snic *snic, u32 tag)
+{
+	struct cxgb4i_ddp_info *ddp = snic->ddp;
+	u32 idx;
+
+	if (!ddp) {
+		cxgb4i_log_error("release ddp tag 0x%x, ddp NULL.\n", tag);
+		return;
+	}
+
+	idx = (tag >> PPOD_IDX_SHIFT) & ddp->idx_mask;
+	if (idx < ddp->nppods) {
+		struct cxgb4i_gather_list *gl = ddp->gl_map[idx];
+		unsigned int npods;
+
+		if (!gl || !gl->nelem) {
+			cxgb4i_log_error("rel 0x%x, idx 0x%x, gl 0x%p, %u\n",
+					tag, idx, gl, gl ? gl->nelem : 0);
+			return;
+		}
+		npods = (gl->nelem + PPOD_PAGES_MAX - 1) >> PPOD_PAGES_SHIFT;
+		cxgb4i_log_debug("ddp tag 0x%x, release idx 0x%x, npods %u.\n",
+				tag, idx, npods);
+		cxgb4i_ddp_clear_map(ddp, tag, idx, npods);
+		cxgb4i_ddp_unmark_entries(ddp, idx, npods);
+		cxgb4i_ddp_release_gl(gl, ddp->pdev);
+	} else
+		cxgb4i_log_error("ddp tag 0x%x, idx 0x%x > max 0x%x.\n",
+				tag, idx, ddp->nppods);
+}
+
+static int cxgb4i_ddp_setup_conn_pgidx(struct cxgb4i_sock *csk,
+					unsigned int tid, int pg_idx,
+					bool reply)
+{
+	struct sk_buff *skb = alloc_skb(sizeof(struct cpl_set_tcb_field),
+					GFP_KERNEL);
+	struct cpl_set_tcb_field *req;
+	u64 val = pg_idx < DDP_PGIDX_MAX ? pg_idx : 0;
+
+	if (!skb)
+		return -ENOMEM;
+
+	/*  set up ulp submode and page size */
+	val = (val & 0x03) << 2;
+	val |= TCB_ULP_TYPE(ULP_MODE_ISCSI);
+	req = (struct cpl_set_tcb_field *)skb_put(skb, sizeof(*req));
+	INIT_TP_WR(req, tid);
+	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SET_TCB_FIELD, csk->hwtid));
+	req->reply_ctrl = htons(NO_REPLY(reply) | QUEUENO(csk->rss_qid));
+	req->word_cookie = htons(TCB_WORD(W_TCB_ULP_RAW));
+	req->mask = cpu_to_be64(TCB_ULP_TYPE(TCB_ULP_TYPE_MASK));
+	req->val = cpu_to_be64(val);
+
+	skb->queue_mapping = CPL_PRIORITY_CONTROL;
+
+	cxgb4_ofld_send(csk->snic->lldi.ports[0], skb);
+	return 0;
+}
+
+int cxgb4i_ddp_setup_conn_host_pagesize(struct cxgb4i_sock *csk,
+						unsigned int tid,
+						int reply)
+{
+	return cxgb4i_ddp_setup_conn_pgidx(csk, tid, page_idx, reply);
+}
+
+int cxgb4i_ddp_setup_conn_pagesize(struct cxgb4i_sock *csk, unsigned int tid,
+					int reply, unsigned long pgsz)
+{
+	int pgidx = cxgb4i_ddp_find_page_index(pgsz);
+
+	return cxgb4i_ddp_setup_conn_pgidx(csk, tid, pgidx, reply);
+}
+
+int cxgb4i_ddp_setup_conn_digest(struct cxgb4i_sock *csk, unsigned int tid,
+				int hcrc, int dcrc, int reply)
+{
+	struct sk_buff *skb = alloc_skb(sizeof(struct cpl_set_tcb_field),
+					GFP_KERNEL);
+	struct cpl_set_tcb_field *req;
+	u64 val = (hcrc ? ULP_CRC_HEADER : 0) | (dcrc ? ULP_CRC_DATA : 0);
+	val = TCB_ULP_RAW(val);
+	val |= TCB_ULP_TYPE(ULP_MODE_ISCSI);
+
+	if (!skb)
+		return -ENOMEM;
+
+	/*  set up ulp submode and page size */
+	req = (struct cpl_set_tcb_field *)skb_put(skb, sizeof(*req));
+	INIT_TP_WR(req, tid);
+	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SET_TCB_FIELD, tid));
+	req->reply_ctrl = htons(NO_REPLY(reply) | QUEUENO(csk->rss_qid));
+	req->word_cookie = htons(TCB_WORD(W_TCB_ULP_RAW));
+	req->mask = cpu_to_be64(TCB_ULP_RAW(TCB_ULP_RAW_MASK));
+	req->val = cpu_to_be64(val);
+
+	skb->queue_mapping = CPL_PRIORITY_CONTROL;
+
+	cxgb4_ofld_send(csk->snic->lldi.ports[0], skb);
+	return 0;
+}
+
+int cxgb4i_ddp_info(struct cxgb4i_snic *snic,
+			struct cxgb4i_tag_format *tformat,
+			unsigned int *txsz, unsigned int *rxsz)
+{
+	struct cxgb4i_ddp_info *ddp;
+	unsigned char idx_bits;
+
+	if (!tformat || !snic->ddp)
+		return -EINVAL;
+
+	ddp = (struct cxgb4i_ddp_info *)snic->ddp;
+
+	idx_bits = 32 - tformat->sw_bits;
+	tformat->rsvd_bits = ddp->idx_bits;
+	tformat->rsvd_shift = PPOD_IDX_SHIFT;
+	tformat->rsvd_mask = (1 << tformat->rsvd_bits) - 1;
+
+	cxgb4i_log_info("tag format: sw %u, rsvd %u,%u, mask 0x%x.\n",
+			tformat->sw_bits, tformat->rsvd_bits,
+			tformat->rsvd_shift, tformat->rsvd_mask);
+
+	*txsz = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
+			ddp->max_txsz - ISCSI_PDU_NONPAYLOAD_LEN);
+	*rxsz = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
+			ddp->max_rxsz - ISCSI_PDU_NONPAYLOAD_LEN);
+	cxgb4i_log_info("max payload size: %u/%u, %u/%u.\n",
+			*txsz, ddp->max_txsz, *rxsz, ddp->max_rxsz);
+	return 0;
+}
+
+static void __cxgb4i_ddp_cleanup(struct kref *kref)
+{
+	int i = 0;
+	struct cxgb4i_ddp_info *ddp = container_of(kref,
+						struct cxgb4i_ddp_info,
+						refcnt);
+
+	cxgb4i_log_info("kref release ddp 0x%p, snic 0x%p\n", ddp, ddp->snic);
+
+	ddp->snic->ddp = NULL;
+
+	while (i < ddp->nppods) {
+		struct cxgb4i_gather_list *gl = ddp->gl_map[i];
+
+		if (gl) {
+			int npods = (gl->nelem + PPOD_PAGES_MAX - 1) >>
+							PPOD_PAGES_SHIFT;
+			cxgb4i_log_info("snic 0x%p, ddp %d + %d\n",
+						ddp->snic, i, npods);
+			kfree(gl);
+			i += npods;
+		} else
+			i++;
+	}
+	cxgb4i_free_big_mem(ddp);
+}
+
+
+static void __cxgb4i_ddp_init(struct cxgb4i_snic *snic)
+{
+	struct cxgb4i_ddp_info *ddp = snic->ddp;
+	unsigned int ppmax, bits, tagmask, pgsz_factor[4];
+	int i;
+
+	if (ddp) {
+		kref_get(&ddp->refcnt);
+		cxgb4i_log_warn("snic 0x%p, ddp 0x%p already set up\n",
+				snic, snic->ddp);
+		return;
+	}
+
+	sw_tag_idx_bits = (__ilog2_u32(ISCSI_ITT_MASK)) + 1;
+	sw_tag_age_bits = (__ilog2_u32(ISCSI_AGE_MASK)) + 1;
+	snic->tag_format.sw_bits = sw_tag_idx_bits + sw_tag_age_bits;
+
+	cxgb4i_log_info("tag itt 0x%x, %u bits, age 0x%x, %u bits\n",
+			ISCSI_ITT_MASK, sw_tag_idx_bits,
+			ISCSI_AGE_MASK, sw_tag_age_bits);
+
+	ppmax = (snic->lldi.vr->iscsi.size >> PPOD_SIZE_SHIFT);
+	bits = __ilog2_u32(ppmax) + 1;
+	if (bits > PPOD_IDX_MAX_SIZE)
+		bits = PPOD_IDX_MAX_SIZE;
+	ppmax = (1 << (bits - 1)) - 1;
+
+	ddp = cxgb4i_alloc_big_mem(sizeof(struct cxgb4i_ddp_info) +
+			ppmax * (sizeof(struct cxgb4i_gather_list *) +
+				sizeof(struct sk_buff *)),
+				GFP_KERNEL);
+	if (!ddp) {
+		cxgb4i_log_warn("snic 0x%p unable to alloc ddp 0x%d, "
+			       "ddp disabled\n", snic, ppmax);
+		return;
+	}
+
+	ddp->gl_map = (struct cxgb4i_gather_list **)(ddp + 1);
+	spin_lock_init(&ddp->map_lock);
+	kref_init(&ddp->refcnt);
+
+	ddp->snic = snic;
+	ddp->pdev = snic->lldi.pdev;
+	ddp->max_txsz = min_t(unsigned int,
+				snic->lldi.iscsi_iolen,
+				ULP2_MAX_PKT_SIZE);
+	ddp->max_rxsz = min_t(unsigned int,
+				snic->lldi.iscsi_iolen,
+				ULP2_MAX_PKT_SIZE);
+	ddp->llimit = snic->lldi.vr->iscsi.start;
+	ddp->ulimit = ddp->llimit + snic->lldi.vr->iscsi.size;
+	ddp->nppods = ppmax;
+	ddp->idx_last = ppmax;
+	ddp->idx_bits = bits;
+	ddp->idx_mask = (1 << bits) - 1;
+	ddp->rsvd_tag_mask = (1 << (bits + PPOD_IDX_SHIFT)) - 1;
+
+	tagmask = ddp->idx_mask << PPOD_IDX_SHIFT;
+	for (i = 0; i < DDP_PGIDX_MAX; i++)
+		pgsz_factor[i] = ddp_page_order[i];
+
+	cxgb4_iscsi_init(snic->lldi.ports[0], tagmask, pgsz_factor);
+	snic->ddp = ddp;
+
+	snic->tag_format.rsvd_bits = ddp->idx_bits;
+	snic->tag_format.rsvd_shift = PPOD_IDX_SHIFT;
+	snic->tag_format.rsvd_mask = ((1 << snic->tag_format.rsvd_bits) - 1);
+
+	cxgb4i_log_info("tag format: sw %u, rsvd %u,%u, mask 0x%x.\n",
+						snic->tag_format.sw_bits,
+						snic->tag_format.rsvd_bits,
+						snic->tag_format.rsvd_shift,
+						snic->tag_format.rsvd_mask);
+
+	snic->tx_max_size = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
+				ddp->max_txsz - ISCSI_PDU_NONPAYLOAD_LEN);
+	snic->rx_max_size = min_t(unsigned int, ULP2_MAX_PDU_PAYLOAD,
+				ddp->max_rxsz - ISCSI_PDU_NONPAYLOAD_LEN);
+
+	cxgb4i_log_info("max payload size: %u/%u, %u/%u.\n",
+			snic->tx_max_size, ddp->max_txsz,
+			snic->rx_max_size, ddp->max_rxsz);
+
+	cxgb4i_log_info("snic 0x%p, nppods %u, bits %u, mask 0x%x,0x%x "
+			"pkt %u/%u, %u/%u\n",
+			snic, ppmax, ddp->idx_bits, ddp->idx_mask,
+			ddp->rsvd_tag_mask, ddp->max_txsz,
+			snic->lldi.iscsi_iolen,
+			ddp->max_rxsz, snic->lldi.iscsi_iolen);
+
+	return;
+}
+
+void cxgb4i_ddp_init(struct cxgb4i_snic *snic)
+{
+	if (page_idx == DDP_PGIDX_MAX) {
+		page_idx = cxgb4i_ddp_find_page_index(PAGE_SIZE);
+
+		if (page_idx == DDP_PGIDX_MAX) {
+			cxgb4i_log_info("system PAGE_SIZE %lu, update hw\n",
+					PAGE_SIZE);
+
+			if (cxgb4i_ddp_adjust_page_table()) {
+				cxgb4i_log_info("PAGE_SIZE %lu, ddp disabled\n",
+						PAGE_SIZE);
+				return;
+			}
+			page_idx = cxgb4i_ddp_find_page_index(PAGE_SIZE);
+		}
+		cxgb4i_log_info("system PAGE_SIZE %lu, ddp idx %u\n",
+				PAGE_SIZE, page_idx);
+	}
+
+	__cxgb4i_ddp_init(snic);
+}
+
+void cxgb4i_ddp_cleanup(struct cxgb4i_snic *snic)
+{
+	struct cxgb4i_ddp_info *ddp = snic->ddp;
+
+	cxgb4i_log_info("snic 0x%p, release ddp 0x%p\n", snic, ddp);
+	if (ddp)
+		kref_put(&ddp->refcnt, __cxgb4i_ddp_cleanup);
+}
+
diff --git a/drivers/scsi/cxgb4i/cxgb4i_ddp.h b/drivers/scsi/cxgb4i/cxgb4i_ddp.h
new file mode 100644
index 0000000..e749377
--- /dev/null
+++ b/drivers/scsi/cxgb4i/cxgb4i_ddp.h
@@ -0,0 +1,238 @@
+/*
+ * cxgb4i_ddp.h: Chelsio T4 iSCSI driver.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#ifndef	__CXGB4I_DDP_H__
+#define	__CXGB4I_DDP_H__
+
+#include <linux/vmalloc.h>
+
+struct cxgb4i_sock;
+
+struct cxgb4i_tag_format {
+	unsigned char sw_bits;
+	unsigned char rsvd_bits;
+	unsigned char rsvd_shift;
+	unsigned char filler[1];
+	unsigned int rsvd_mask;
+};
+
+struct cxgb4i_gather_list {
+	unsigned int tag;
+	unsigned int length;
+	unsigned int offset;
+	unsigned int nelem;
+	struct page **pages;
+	dma_addr_t phys_addr[0];
+};
+
+struct cxgb4i_ddp_info {
+	struct list_head list;
+	struct kref refcnt;
+	struct cxgb4i_snic *snic;
+	struct pci_dev *pdev;
+	unsigned int max_txsz;
+	unsigned int max_rxsz;
+	unsigned int llimit;
+	unsigned int ulimit;
+	unsigned int nppods;
+	unsigned int idx_last;
+	unsigned char idx_bits;
+	unsigned char filler[3];
+	unsigned int idx_mask;
+	unsigned int rsvd_tag_mask;
+	spinlock_t map_lock;
+	struct cxgb4i_gather_list **gl_map;
+};
+
+#define ISCSI_PDU_NONPAYLOAD_LEN	312 /* bhs(48) + ahs(256) + digest(8)*/
+#define ULP2_MAX_PKT_SIZE		16224
+#define ULP2_MAX_PDU_PAYLOAD	\
+	(ULP2_MAX_PKT_SIZE - ISCSI_PDU_NONPAYLOAD_LEN)
+/* # of pages a pagepod can hold without needing another pagepod */
+#define PPOD_PAGES			4
+#define PPOD_PAGES_MAX			4
+#define PPOD_PAGES_SHIFT		2       /*  4 pages per pod */
+
+struct pagepod_hdr {
+	unsigned int vld_tid;
+	unsigned int pgsz_tag_clr;
+	unsigned int max_offset;
+	unsigned int page_offset;
+	unsigned long long rsvd;
+};
+
+struct pagepod {
+	struct pagepod_hdr hdr;
+	unsigned long long addr[PPOD_PAGES_MAX + 1];
+};
+
+#define PPOD_SIZE               sizeof(struct pagepod)  /*  64 */
+#define PPOD_SIZE_SHIFT         6
+
+#define ULPMEM_DSGL_MAX_NPPODS	16	/*  1024/PPOD_SIZE */
+#define ULPMEM_IDATA_MAX_NPPODS	4	/*  256/PPOD_SIZE */
+#define PCIE_MEMWIN_MAX_NPPODS	16	/*  1024/PPOD_SIZE */
+
+#define PPOD_COLOR_SHIFT	0
+#define PPOD_COLOR_MASK		0x3F
+#define PPOD_COLOR_SIZE         6
+#define PPOD_COLOR(x)		((x) << PPOD_COLOR_SHIFT)
+
+#define PPOD_TAG_SHIFT	6
+#define PPOD_TAG_MASK	0xFFFFFF
+#define PPOD_TAG(x)	((x) << PPOD_TAG_SHIFT)
+
+#define PPOD_PGSZ_SHIFT	30
+#define PPOD_PGSZ_MASK	0x3
+#define PPOD_PGSZ(x)	((x) << PPOD_PGSZ_SHIFT)
+
+#define PPOD_TID_SHIFT	32
+#define PPOD_TID_MASK	0xFFFFFF
+#define PPOD_TID(x)	((__u64)(x) << PPOD_TID_SHIFT)
+
+#define PPOD_VALID_SHIFT	56
+#define PPOD_VALID(x)	((__u64)(x) << PPOD_VALID_SHIFT)
+#define PPOD_VALID_FLAG	PPOD_VALID(1ULL)
+
+#define PPOD_LEN_SHIFT	32
+#define PPOD_LEN_MASK	0xFFFFFFFF
+#define PPOD_LEN(x)	((__u64)(x) << PPOD_LEN_SHIFT)
+
+#define PPOD_OFST_SHIFT	0
+#define PPOD_OFST_MASK	0xFFFFFFFF
+#define PPOD_OFST(x)	((x) << PPOD_OFST_SHIFT)
+
+#define PPOD_IDX_SHIFT          PPOD_COLOR_SIZE
+#define PPOD_IDX_MAX_SIZE       24
+
+
+static inline void *cxgb4i_alloc_big_mem(unsigned int size,
+						gfp_t gfp)
+{
+	void *p = kmalloc(size, gfp);
+	if (!p)
+		p = vmalloc(size);
+	if (p)
+		memset(p, 0, size);
+	return p;
+}
+
+static inline void cxgb4i_free_big_mem(void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		vfree(addr);
+	else
+		kfree(addr);
+}
+
+
+static inline int cxgb4i_is_ddp_tag(struct cxgb4i_tag_format *tformat, u32 tag)
+{
+	return !(tag & (1 << (tformat->rsvd_bits + tformat->rsvd_shift - 1)));
+}
+
+static inline int cxgb4i_sw_tag_usable(struct cxgb4i_tag_format *tformat,
+					u32 sw_tag)
+{
+	sw_tag >>= (32 - tformat->rsvd_bits);
+	return !sw_tag;
+}
+
+static inline u32 cxgb4i_set_non_ddp_tag(struct cxgb4i_tag_format *tformat,
+					u32 sw_tag)
+{
+	unsigned char shift = tformat->rsvd_bits + tformat->rsvd_shift - 1;
+
+	u32 mask = (1 << shift) - 1;
+
+	if (sw_tag && (sw_tag & ~mask)) {
+		u32 v1 = sw_tag & ((1 << shift) - 1);
+		u32 v2 = (sw_tag >> (shift - 1)) << shift;
+
+		return v2 | v1 | 1 << shift;
+	}
+
+	return sw_tag | 1 << shift;
+}
+
+static inline u32 cxgb4i_ddp_tag_base(struct cxgb4i_tag_format *tformat,
+					u32 sw_tag)
+{
+	u32 mask = (1 << tformat->rsvd_shift) - 1;
+
+	if (sw_tag && (sw_tag & ~mask)) {
+		u32 v1 = sw_tag & mask;
+		u32 v2 = sw_tag >> tformat->rsvd_shift;
+
+		v2 <<= tformat->rsvd_bits + tformat->rsvd_shift;
+
+		return v2 | v1;
+	}
+
+	return sw_tag;
+}
+
+static inline u32 cxgb4i_tag_rsvd_bits(struct cxgb4i_tag_format *tformat,
+					u32 tag)
+{
+	if (cxgb4i_is_ddp_tag(tformat, tag))
+		return (tag >> tformat->rsvd_shift) & tformat->rsvd_mask;
+
+	return 0;
+}
+
+static inline u32 cxgb4i_tag_nonrsvd_bits(struct cxgb4i_tag_format *tformat,
+					u32 tag)
+{
+	unsigned char shift = tformat->rsvd_bits + tformat->rsvd_shift - 1;
+	u32 v1, v2;
+
+	if (cxgb4i_is_ddp_tag(tformat, tag)) {
+		v1 = tag & ((1 << tformat->rsvd_shift) - 1);
+		v2 = (tag >> (shift + 1)) << tformat->rsvd_shift;
+	} else {
+		u32 mask = (1 << shift) - 1;
+		tag &= ~(1 << shift);
+		v1 = tag & mask;
+		v2 = (tag >> 1) & ~mask;
+	}
+	return v1 | v2;
+}
+
+int cxgb4i_ddp_tag_reserve(struct cxgb4i_snic *, unsigned int,
+				struct cxgb4i_tag_format *, u32 *,
+				struct cxgb4i_gather_list *, gfp_t);
+void cxgb4i_ddp_tag_release(struct cxgb4i_snic *, u32);
+struct cxgb4i_gather_list *cxgb4i_ddp_make_gl(unsigned int,
+					struct scatterlist *,
+					unsigned int,
+					struct pci_dev *,
+					gfp_t);
+void cxgb4i_ddp_release_gl(struct cxgb4i_gather_list *,
+			struct pci_dev *pdev);
+int cxgb4i_ddp_setup_conn_host_pagesize(struct cxgb4i_sock*, unsigned int,
+					int);
+int cxgb3i_ddp_setup_conn_pagesize(struct cxgb4i_sock *, unsigned int, int,
+					unsigned long);
+
+int cxgb4i_ddp_setup_conn_digest(struct cxgb4i_sock *, unsigned int,
+				int, int, int);
+int cxgb4i_ddp_find_page_idx(unsigned long);
+int cxgb4i_snic_ddp_info(struct cxgb4i_snic *, struct cxgb4i_tag_format *,
+			unsigned int *, unsigned int *);
+
+void cxgb4i_ddp_init(struct cxgb4i_snic *);
+void cxgb4i_ddp_cleanup(struct cxgb4i_snic *);
+
+#endif	/* __CXGB4I_DDP_H__ */
+
diff --git a/drivers/scsi/cxgb4i/cxgb4i_offload.c b/drivers/scsi/cxgb4i/cxgb4i_offload.c
new file mode 100644
index 0000000..dfa1c72
--- /dev/null
+++ b/drivers/scsi/cxgb4i/cxgb4i_offload.c
@@ -0,0 +1,1853 @@
+/*
+ * cxgb4i_offload.c: Chelsio T4 iSCSI driver.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#include <linux/if_vlan.h>
+#include <net/dst.h>
+#include <net/route.h>
+#include <net/tcp.h>
+
+#include "cxgb4i.h"
+#include "cxgb4i_offload.h"
+#include "libcxgbi.h"
+
+static int cxgb4i_rcv_win = 256 * 1024;
+module_param(cxgb4i_rcv_win, int, 0644);
+MODULE_PARM_DESC(cxgb4i_rcv_win, "TCP reveive window in bytes");
+
+static int cxgb4i_snd_win = 128 * 1024;
+module_param(cxgb4i_snd_win, int, 0644);
+MODULE_PARM_DESC(cxgb4i_snd_win, "TCP send window in bytes");
+
+static int cxgb4i_rx_credit_thres = 10 * 1024;
+module_param(cxgb4i_rx_credit_thres, int, 0644);
+MODULE_PARM_DESC(cxgb4i_rx_credit_thres,
+		"RX credits return threshold in bytes (default=10KB)");
+
+static unsigned int cxgb4i_max_connect = (8 * 1024);
+module_param(cxgb4i_max_connect, uint, 0644);
+MODULE_PARM_DESC(cxgb4i_max_connect, "Maximum number of connections");
+
+static unsigned short cxgb4i_sport_base = 20000;
+module_param(cxgb4i_sport_base, ushort, 0644);
+MODULE_PARM_DESC(cxgb4i_sport_base, "Starting port number (default 20000)");
+
+#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+#define RCV_BUFSIZ_MASK	0x3FFU
+
+static void cxgb4i_sock_release_offload_resources(struct cxgb4i_sock *);
+static void cxgb4i_sock_conn_closing(struct cxgb4i_sock *);
+static int cxgb4i_sock_push_tx_frames(struct cxgb4i_sock *, int);
+
+
+#define MAX_IMM_TX_PKT_LEN 128
+
+/*
+ * is_ofld_imm - check whether a packet can be sent as immediate data
+ * @skb: the packet
+ *
+ * Returns true if a packet can be sent as an offload WR with immediate
+ * data.  We currently use the same limit as for Ethernet packets.
+ */
+static inline int is_ofld_imm(const struct sk_buff *skb)
+{
+	return skb->len <= (MAX_IMM_TX_PKT_LEN -
+			sizeof(struct fw_ofld_tx_data_wr));
+}
+
+static int cxgb4i_sock_get_port(struct cxgb4i_sock *csk)
+{
+	unsigned int start;
+	int idx;
+
+	if (!csk->snic->pmap)
+		goto error_out;
+
+	if (csk->saddr.sin_port) {
+		cxgb4i_log_error("connect, sin_port none ZERO %u\n",
+				ntohs(csk->saddr.sin_port));
+		return -EADDRINUSE;
+	}
+
+	spin_lock_bh(&csk->snic->pmap->lock);
+	start = idx = csk->snic->pmap->next;
+
+	do {
+		if (++idx >= cxgb4i_max_connect)
+			idx = 0;
+		if (!csk->snic->pmap->port_csk[idx]) {
+			csk->saddr.sin_port = htons(cxgb4i_sport_base + idx);
+			csk->snic->pmap->next = idx;
+			csk->snic->pmap->port_csk[idx] = csk;
+			spin_unlock_bh(&csk->snic->pmap->lock);
+
+			cxgb4i_conn_debug("reserved port %u\n",
+						cxgb4i_sport_base + idx);
+
+			return 0;
+		}
+	} while (idx != start);
+	spin_unlock_bh(&csk->snic->pmap->lock);
+
+error_out:
+	return -EADDRNOTAVAIL;
+}
+
+static void cxgb4i_sock_put_port(struct cxgb4i_sock *csk)
+{
+	if (csk->saddr.sin_port) {
+		int idx = ntohs(csk->saddr.sin_port) - cxgb4i_sport_base;
+
+		csk->saddr.sin_port = 0;
+		if (idx < 0 || idx >= cxgb4i_sport_base)
+			return;
+
+		spin_lock_bh(&csk->snic->pmap->lock);
+		csk->snic->pmap->port_csk[idx] = NULL;
+		spin_unlock_bh(&csk->snic->pmap->lock);
+
+		cxgb4i_conn_debug("released port %u\n",
+				cxgb4i_sport_base + idx);
+	}
+}
+
+static inline void cxgb4i_sock_set_flag(struct cxgb4i_sock *csk,
+					enum cxgb4i_sock_flags flag)
+{
+	__set_bit(flag, &csk->flags);
+	cxgb4i_conn_debug("csk 0x%p, set %d, state %u, flags 0x%lu\n",
+			csk, flag, csk->state, csk->flags);
+}
+
+static inline void cxgb4i_sock_clear_flag(struct cxgb4i_sock *csk,
+					enum cxgb4i_sock_flags flag)
+{
+	__clear_bit(flag, &csk->flags);
+	cxgb4i_conn_debug("csk 0x%p, clear %d, state %u, flags 0x%lu\n",
+			csk, flag, csk->state, csk->flags);
+}
+
+static inline int cxgb4i_sock_flag(struct cxgb4i_sock *csk,
+				enum cxgb4i_sock_flags flag)
+{
+	if (csk == NULL)
+		return 0;
+
+	return test_bit(flag, &csk->flags);
+}
+
+static void cxgb4i_sock_set_state(struct cxgb4i_sock *csk, int state)
+{
+	csk->state = state;
+}
+
+static inline void cxgb4i_sock_hold(struct cxgb4i_sock *csk)
+{
+	atomic_inc(&csk->refcnt);
+}
+
+static inline void cxgb4i_sock_put(struct cxgb4i_sock *csk)
+{
+	if (atomic_dec_and_test(&csk->refcnt)) {
+		cxgb4i_conn_debug("free csk 0x%p, state %u, flags 0x%lx\n",
+				csk, csk->state, csk->flags);
+		kfree(csk);
+	}
+}
+
+static void cxgb4i_sock_closed(struct cxgb4i_sock *csk)
+{
+	cxgb4i_conn_debug("csk 0x%p, state %u, flags 0x%lx\n",
+			csk, csk->state, csk->flags);
+
+	cxgb4i_sock_put_port(csk);
+	cxgb4i_sock_release_offload_resources(csk);
+	cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_CLOSED);
+	cxgb4i_sock_conn_closing(csk);
+}
+
+static unsigned int cxgb4i_find_best_mtu(struct cxgb4i_snic *snic,
+						unsigned short mtu)
+{
+	int i = 0;
+
+	while (i < NMTUS - 1 && snic->lldi.mtus[i + 1] <= mtu)
+		++i;
+
+	return i;
+}
+
+static unsigned int cxgb4i_select_mss(struct cxgb4i_sock *csk,
+						unsigned int pmtu)
+{
+	unsigned int idx;
+	struct dst_entry *dst = csk->dst;
+	u16 advmss = dst_metric(dst, RTAX_ADVMSS);
+
+	if (advmss > pmtu - 40)
+		advmss = pmtu - 40;
+	if (advmss < csk->snic->lldi.mtus[0] - 40)
+		advmss = csk->snic->lldi.mtus[0] - 40;
+	idx = cxgb4i_find_best_mtu(csk->snic, advmss + 40);
+
+	return idx;
+}
+
+static inline int cxgb4i_sock_compute_wscale(int win)
+{
+	int wscale = 0;
+
+	while (wscale < 14 && (65535 << wscale) < win)
+		wscale++;
+
+	return wscale;
+}
+
+static void cxgb4i_sock_make_act_open_req(struct cxgb4i_sock *csk,
+					   struct sk_buff *skb,
+					   unsigned int qid_atid,
+					   struct l2t_entry *e)
+{
+	struct cpl_act_open_req *req;
+	unsigned long long opt0;
+	unsigned int opt2;
+	int wscale;
+
+	cxgb4i_conn_debug("csk 0x%p, atid 0x%x\n", csk, qid_atid);
+
+	wscale = cxgb4i_sock_compute_wscale(csk->mss_idx);
+
+	opt0 = KEEP_ALIVE(1) |
+		WND_SCALE(wscale) |
+		MSS_IDX(csk->mss_idx) |
+		L2T_IDX(csk->l2t->idx) |
+		TX_CHAN(csk->tx_chan) |
+		SMAC_SEL(csk->smac_idx) |
+		RCV_BUFSIZ(cxgb4i_rcv_win >> 10);
+
+	opt2 = RX_CHANNEL(0) |
+		RSS_QUEUE_VALID |
+		RSS_QUEUE(csk->rss_qid);
+
+	skb->queue_mapping = CPL_PRIORITY_SETUP;
+	req = (struct cpl_act_open_req *)__skb_put(skb, sizeof(*req));
+	INIT_TP_WR(req, 0);
+	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ACT_OPEN_REQ,
+					qid_atid));
+	req->local_port = csk->saddr.sin_port;
+	req->peer_port = csk->daddr.sin_port;
+	req->local_ip = csk->saddr.sin_addr.s_addr;
+	req->peer_ip = csk->daddr.sin_addr.s_addr;
+	req->opt0 = cpu_to_be64(opt0);
+	req->params = 0;
+	req->opt2 = cpu_to_be32(opt2);
+}
+
+static void cxgb4i_fail_act_open(struct cxgb4i_sock *csk, int errno)
+{
+	cxgb4i_conn_debug("csk 0%p, state %u, flag 0x%lx\n", csk,
+			csk->state, csk->flags);
+	csk->err = errno;
+	cxgb4i_sock_closed(csk);
+}
+
+static void cxgb4i_act_open_req_arp_failure(void *handle, struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk = (struct cxgb4i_sock *)skb->sk;
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+	if (csk->state == CXGB4I_CSK_ST_CONNECTING)
+		cxgb4i_fail_act_open(csk, -EHOSTUNREACH);
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+	__kfree_skb(skb);
+}
+
+static void cxgb4i_sock_skb_entail(struct cxgb4i_sock *csk,
+					struct sk_buff *skb,
+					int flags)
+{
+	cxgb4i_skb_tcp_seq(skb) = csk->write_seq;
+	cxgb4i_skb_flags(skb) = flags;
+	__skb_queue_tail(&csk->write_queue, skb);
+}
+
+static void cxgb4i_sock_send_close_req(struct cxgb4i_sock *csk)
+{
+	struct sk_buff *skb = csk->cpl_close;
+	struct cpl_close_con_req *req = (struct cpl_close_con_req *)skb->head;
+	unsigned int tid = csk->hwtid;
+
+	csk->cpl_close = NULL;
+
+	skb->queue_mapping = CPL_PRIORITY_DATA;
+	INIT_TP_WR(req, tid);
+
+	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_CLOSE_CON_REQ, tid));
+	req->rsvd = 0;
+
+	cxgb4i_sock_skb_entail(csk, skb, CXGB4I_SKCB_FLAG_NO_APPEND);
+	if (csk->state != CXGB4I_CSK_ST_CONNECTING)
+		cxgb4i_sock_push_tx_frames(csk, 1);
+}
+
+static void cxgb4i_sock_abort_arp_failure(void *handle, struct sk_buff *skb)
+{
+	struct cpl_abort_req *req = cplhdr(skb);
+	struct cxgb4i_sock *csk = (struct cxgb4i_sock *)handle;
+
+	req->cmd = CPL_ABORT_NO_RST;
+	cxgb4_ofld_send(csk->snic->lldi.ports[csk->port_id], skb);
+}
+
+static inline void cxgb4i_sock_purge_write_queue(struct cxgb4i_sock *csk)
+{
+	struct sk_buff *skb;
+
+	while ((skb = __skb_dequeue(&csk->write_queue)))
+		__kfree_skb(skb);
+}
+
+static void cxgb4i_sock_send_abort_req(struct cxgb4i_sock *csk)
+{
+	struct cpl_abort_req *req;
+	struct sk_buff *skb = csk->cpl_abort_req;
+
+	if (unlikely(csk->state == CXGB4I_CSK_ST_ABORTING) ||
+			!skb || !csk->snic)
+		return;
+
+	cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_ABORTING);
+
+	cxgb4i_conn_debug("csk 0x%p, flag ABORT_RPL + ABORT_SHUT\n", csk);
+
+	cxgb4i_sock_set_state(csk, CXGB4I_CSK_FL_ABORT_RPL_PENDING);
+
+	cxgb4i_sock_purge_write_queue(csk);
+
+	csk->cpl_abort_req = NULL;
+	req = (struct cpl_abort_req *)skb->head;
+
+	skb->queue_mapping = CPL_PRIORITY_DATA;
+	t4_set_arp_err_handler(skb, csk, cxgb4i_sock_abort_arp_failure);
+	INIT_TP_WR(req, csk->hwtid);
+	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_REQ, csk->hwtid));
+	req->rsvd0 = htonl(csk->snd_nxt);
+	req->rsvd1 = !cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_TX_DATA_SENT);
+	req->cmd = CPL_ABORT_SEND_RST;
+
+	cxgb4_l2t_send(csk->snic->lldi.ports[csk->port_id], skb, csk->l2t);
+}
+
+static void cxgb4i_sock_send_abort_rpl(struct cxgb4i_sock *csk, int rst_status)
+{
+	struct sk_buff *skb = csk->cpl_abort_rpl;
+	struct cpl_abort_rpl *rpl = (struct cpl_abort_rpl *)skb->head;
+
+	csk->cpl_abort_rpl = NULL;
+
+	skb->queue_mapping = CPL_PRIORITY_DATA;
+	INIT_TP_WR(rpl, csk->hwtid);
+	OPCODE_TID(rpl) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_RPL, csk->hwtid));
+	rpl->cmd = rst_status;
+
+	cxgb4_ofld_send(csk->snic->lldi.ports[csk->port_id], skb);
+}
+
+static u32 cxgb4i_csk_send_rx_credits(struct cxgb4i_sock *csk, u32 credits)
+{
+	struct sk_buff *skb;
+	struct cpl_rx_data_ack *req;
+	int wrlen = roundup(sizeof(*req), 16);
+
+	skb = alloc_skb(wrlen, GFP_ATOMIC);
+	if (!skb)
+		return 0;
+
+	req = (struct cpl_rx_data_ack *)__skb_put(skb, wrlen);
+	memset(req, 0, wrlen);
+	skb->queue_mapping = CPL_PRIORITY_ACK;
+	INIT_TP_WR(req, csk->hwtid);
+	OPCODE_TID(req) =
+		cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK, csk->hwtid));
+	req->credit_dack = cpu_to_be32(RX_CREDITS(credits) | RX_FORCE_ACK(1));
+	cxgb4_ofld_send(csk->snic->lldi.ports[csk->port_id], skb);
+	return credits;
+}
+
+
+#define SKB_WR_LIST_SIZE	(MAX_SKB_FRAGS + 2)
+
+static const unsigned int cxgb4i_ulp_extra_len[] = { 0, 4, 4, 8 };
+static inline unsigned int ulp_extra_len(const struct sk_buff *skb)
+{
+	return cxgb4i_ulp_extra_len[cxgb4i_skb_ulp_mode(skb) & 3];
+}
+
+static inline void cxgb4i_sock_reset_wr_list(struct cxgb4i_sock *csk)
+{
+	csk->wr_pending_head = csk->wr_pending_tail = NULL;
+}
+
+static inline void cxgb4i_sock_enqueue_wr(struct cxgb4i_sock *csk,
+						struct sk_buff *skb)
+{
+	cxgb4i_skb_tx_wr_next(skb) = NULL;
+
+	/*
+	 * We want to take an extra reference since both us and the driver
+	 * need to free the packet before it's really freed. We know there's
+	 * just one user currently so we use atomic_set rather than skb_get
+	 * to avoid the atomic op.
+	 */
+	atomic_set(&skb->users, 2);
+
+	if (!csk->wr_pending_head)
+		csk->wr_pending_head = skb;
+
+	else
+		cxgb4i_skb_tx_wr_next(csk->wr_pending_tail) = skb;
+
+	csk->wr_pending_tail = skb;
+}
+
+static int cxgb4i_sock_count_pending_wrs(const struct cxgb4i_sock *csk)
+{
+	int n = 0;
+	const struct sk_buff *skb = csk->wr_pending_head;
+
+	while (skb) {
+		n += skb->csum;
+		skb = cxgb4i_skb_tx_wr_next(skb);
+	}
+	return n;
+}
+
+static inline struct sk_buff *cxgb4i_sock_peek_wr(const struct cxgb4i_sock *csk)
+{
+	return csk->wr_pending_head;
+}
+
+static inline void cxgb4i_sock_free_wr_skb(struct sk_buff *skb)
+{
+	kfree_skb(skb);
+}
+
+static inline struct sk_buff *cxgb4i_sock_dequeue_wr(struct cxgb4i_sock *csk)
+{
+	struct sk_buff *skb = csk->wr_pending_head;
+
+	if (likely(skb)) {
+		csk->wr_pending_head = cxgb4i_skb_tx_wr_next(skb);
+		cxgb4i_skb_tx_wr_next(skb) = NULL;
+	}
+	return skb;
+}
+
+static void cxgb4i_sock_purge_wr_queue(struct cxgb4i_sock *csk)
+{
+	struct sk_buff *skb;
+
+	while ((skb = cxgb4i_sock_dequeue_wr(csk)) != NULL)
+		cxgb4i_sock_free_wr_skb(skb);
+}
+
+/*
+ * sgl_len - calculates the size of an SGL of the given capacity
+ * @n: the number of SGL entries
+ * Calculates the number of flits needed for a scatter/gather list that
+ * can hold the given number of entries.
+ */
+static inline unsigned int sgl_len(unsigned int n)
+{
+	n--;
+	return (3 * n) / 2 + (n & 1) + 2;
+}
+
+/*
+ * calc_tx_flits_ofld - calculate # of flits for an offload packet
+ * @skb: the packet
+ *
+ * Returns the number of flits needed for the given offload packet.
+ * These packets are already fully constructed and no additional headers
+ * will be added.
+ */
+static inline unsigned int calc_tx_flits_ofld(const struct sk_buff *skb)
+{
+	unsigned int flits, cnt;
+
+	if (is_ofld_imm(skb))
+		return DIV_ROUND_UP(skb->len, 8);
+
+	flits = skb_transport_offset(skb) / 8;
+	cnt = skb_shinfo(skb)->nr_frags;
+	if (skb->tail != skb->transport_header)
+		cnt++;
+	return flits + sgl_len(cnt);
+}
+
+static inline void cxgb4i_sock_send_tx_flowc_wr(struct cxgb4i_sock *csk)
+{
+	struct sk_buff *skb;
+	struct fw_flowc_wr *flowc;
+	int flowclen, i;
+
+	flowclen = 80;
+	skb = alloc_skb(flowclen, GFP_ATOMIC);
+	flowc = (struct fw_flowc_wr *)__skb_put(skb, flowclen);
+
+	flowc->op_to_nparams =
+		htonl(FW_WR_OP(FW_FLOWC_WR) | FW_FLOWC_WR_NPARAMS(8));
+	flowc->flowid_len16 =
+		htonl(FW_WR_LEN16(DIV_ROUND_UP(72, 16)) |
+				FW_WR_FLOWID(csk->hwtid));
+
+	flowc->mnemval[0].mnemonic = FW_FLOWC_MNEM_PFNVFN;
+	flowc->mnemval[0].val = htonl(0);
+	flowc->mnemval[1].mnemonic = FW_FLOWC_MNEM_CH;
+	flowc->mnemval[1].val = htonl(csk->tx_chan);
+	flowc->mnemval[2].mnemonic = FW_FLOWC_MNEM_PORT;
+	flowc->mnemval[2].val = htonl(csk->tx_chan);
+	flowc->mnemval[3].mnemonic = FW_FLOWC_MNEM_IQID;
+	flowc->mnemval[3].val = htonl(csk->rss_qid);
+	flowc->mnemval[4].mnemonic = FW_FLOWC_MNEM_SNDNXT;
+	flowc->mnemval[4].val = htonl(csk->snd_nxt);
+	flowc->mnemval[5].mnemonic = FW_FLOWC_MNEM_RCVNXT;
+	flowc->mnemval[5].val = htonl(csk->rcv_nxt);
+	flowc->mnemval[6].mnemonic = FW_FLOWC_MNEM_SNDBUF;
+	flowc->mnemval[6].val = htonl(cxgb4i_snd_win);
+	flowc->mnemval[7].mnemonic = FW_FLOWC_MNEM_MSS;
+	flowc->mnemval[7].val = htonl(csk->mss_idx);
+	flowc->mnemval[8].mnemonic = 0;
+	flowc->mnemval[8].val = 0;
+	for (i = 0; i < 9; i++) {
+		flowc->mnemval[i].r4[0] = 0;
+		flowc->mnemval[i].r4[1] = 0;
+		flowc->mnemval[i].r4[2] = 0;
+	}
+
+	skb->queue_mapping = CPL_PRIORITY_DATA;
+
+	cxgb4_ofld_send(csk->snic->lldi.ports[csk->port_id], skb);
+}
+
+static inline void cxgb4i_sock_make_tx_data_wr(struct cxgb4i_sock *csk,
+						struct sk_buff *skb, int dlen,
+						int len, u32 credits,
+						int req_completion)
+{
+	struct fw_ofld_tx_data_wr *req;
+	unsigned int wr_ulp_mode;
+
+	if (is_ofld_imm(skb)) {
+			req = (struct fw_ofld_tx_data_wr *)
+				__skb_push(skb, sizeof(*req));
+			req->op_to_immdlen =
+				cpu_to_be32(FW_WR_OP(FW_OFLD_TX_DATA_WR) |
+					FW_WR_COMPL(req_completion) |
+					FW_WR_IMMDLEN(dlen));
+			req->flowid_len16 =
+				cpu_to_be32(FW_WR_FLOWID(csk->hwtid) |
+						FW_WR_LEN16(credits));
+	} else {
+		req = (struct fw_ofld_tx_data_wr *)
+			__skb_push(skb, sizeof(*req));
+		req->op_to_immdlen =
+			cpu_to_be32(FW_WR_OP(FW_OFLD_TX_DATA_WR) |
+					FW_WR_COMPL(req_completion) |
+					FW_WR_IMMDLEN(0));
+		req->flowid_len16 =
+			cpu_to_be32(FW_WR_FLOWID(csk->hwtid) |
+					FW_WR_LEN16(credits));
+	}
+
+	wr_ulp_mode =
+		FW_OFLD_TX_DATA_WR_ULPMODE(cxgb4i_skb_ulp_mode(skb) >> 4) |
+		FW_OFLD_TX_DATA_WR_ULPSUBMODE(cxgb4i_skb_ulp_mode(skb) & 3);
+
+	req->tunnel_to_proxy = cpu_to_be32(wr_ulp_mode) |
+		FW_OFLD_TX_DATA_WR_SHOVE(skb_peek(&csk->write_queue) ? 0 : 1);
+
+	req->plen = cpu_to_be32(len);
+
+	if (!cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_TX_DATA_SENT))
+		cxgb4i_sock_set_flag(csk, CXGB4I_CSK_FL_TX_DATA_SENT);
+}
+
+static void cxgb4i_sock_arp_failure_discard(void *handle, struct sk_buff *skb)
+{
+	kfree_skb(skb);
+}
+
+static int cxgb4i_sock_push_tx_frames(struct cxgb4i_sock *csk,
+						int req_completion)
+{
+	int total_size = 0;
+	struct sk_buff *skb;
+	struct cxgb4i_snic *snic;
+
+	if (unlikely(csk->state == CXGB4I_CSK_ST_CONNECTING ||
+				csk->state == CXGB4I_CSK_ST_CLOSE_WAIT_1 ||
+				csk->state >= CXGB4I_CSK_ST_ABORTING)) {
+		cxgb4i_tx_debug("csk 0x%p, in closing state %u.\n",
+				csk, csk->state);
+		return 0;
+	}
+
+	snic = csk->snic;
+
+	while (csk->wr_cred
+			&& (skb = skb_peek(&csk->write_queue)) != NULL) {
+		int dlen;
+		int len;
+		unsigned int credits_needed;
+
+		dlen = len = skb->len;
+		skb_reset_transport_header(skb);
+
+		if (is_ofld_imm(skb))
+			credits_needed = DIV_ROUND_UP(dlen +
+					sizeof(struct fw_ofld_tx_data_wr), 16);
+		else
+			credits_needed = DIV_ROUND_UP(8 *
+					calc_tx_flits_ofld(skb)+
+					sizeof(struct fw_ofld_tx_data_wr), 16);
+
+		if (csk->wr_cred < credits_needed) {
+			cxgb4i_tx_debug("csk 0x%p, skb len %u/%u, "
+					"wr %d < %u.\n",
+					csk, skb->len, skb->data_len,
+					credits_needed, csk->wr_cred);
+			break;
+		}
+
+		__skb_unlink(skb, &csk->write_queue);
+		skb->queue_mapping = CPL_PRIORITY_DATA;
+		skb->csum = credits_needed; /* remember this until the WR_ACK */
+		csk->wr_cred -= credits_needed;
+		csk->wr_una_cred += credits_needed;
+		cxgb4i_sock_enqueue_wr(csk, skb);
+
+		cxgb4i_tx_debug("csk 0x%p, enqueue, skb len %u/%u, "
+				"wr %d, left %u, unack %u.\n",
+				csk, skb->len, skb->data_len,
+				credits_needed, csk->wr_cred,
+				csk->wr_una_cred);
+
+
+		if (likely(cxgb4i_skb_flags(skb) &
+					CXGB4I_SKCB_FLAG_NEED_HDR)) {
+			len += ulp_extra_len(skb);
+			if (!cxgb4i_sock_flag(csk,
+						CXGB4I_CSK_FL_TX_DATA_SENT)) {
+				cxgb4i_sock_send_tx_flowc_wr(csk);
+				skb->csum += 5;
+				csk->wr_cred -= 5;
+				csk->wr_una_cred += 5;
+			}
+
+			if ((req_completion &&
+				csk->wr_una_cred == credits_needed) ||
+				(cxgb4i_skb_flags(skb) &
+				  CXGB4I_SKCB_FLAG_COMPL) ||
+				csk->wr_una_cred >= csk->wr_max_cred / 2) {
+				req_completion = 1;
+				csk->wr_una_cred = 0;
+			}
+			cxgb4i_sock_make_tx_data_wr(csk, skb, dlen, len,
+							credits_needed,
+							req_completion);
+			csk->snd_nxt += len;
+
+			if (req_completion)
+				cxgb4i_skb_flags(skb) &=
+					~CXGB4I_SKCB_FLAG_NEED_HDR;
+		}
+
+		total_size += skb->truesize;
+		t4_set_arp_err_handler(skb, csk,
+					cxgb4i_sock_arp_failure_discard);
+		cxgb4_l2t_send(snic->lldi.ports[csk->port_id], skb, csk->l2t);
+	}
+	return total_size;
+}
+
+static inline void cxgb4i_sock_free_atid(struct cxgb4i_sock *csk)
+{
+	cxgb4_free_atid(csk->snic->lldi.tids, csk->atid);
+	cxgb4i_sock_put(csk);
+}
+
+static void cxgb4i_sock_established(struct cxgb4i_sock *csk, u32 snd_isn,
+					unsigned int opt)
+{
+	cxgb4i_conn_debug("csk 0x%p, state %u.\n", csk, csk->state);
+
+	csk->write_seq = csk->snd_nxt = csk->snd_una = snd_isn;
+
+	/*
+	 * Causes the first RX_DATA_ACK to supply any Rx credits we couldn't
+	 * pass through opt0.
+	 */
+	if (cxgb4i_rcv_win > (RCV_BUFSIZ_MASK << 10))
+		csk->rcv_wup -= cxgb4i_rcv_win - (RCV_BUFSIZ_MASK << 10);
+
+	dst_confirm(csk->dst);
+
+	smp_mb();
+
+	cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_ESTABLISHED);
+}
+
+static int cxgb4i_cpl_act_establish(struct cxgb4i_snic *snic,
+						struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_act_establish *req = cplhdr(skb);
+	unsigned int hwtid = GET_TID(req);
+	unsigned int atid = GET_TID_TID(ntohl(req->tos_atid));
+	struct tid_info *t = snic->lldi.tids;
+	u32 rcv_isn = be32_to_cpu(req->rcv_isn);
+
+	csk = lookup_atid(t, atid);
+
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	cxgb4i_conn_debug("csk 0x%p, state %u, flag 0x%lx\n",
+				csk, csk->state, csk->flags);
+	csk->hwtid = hwtid;
+	cxgb4i_sock_hold(csk);
+	cxgb4_insert_tid(snic->lldi.tids, csk, hwtid);
+	cxgb4_free_atid(snic->lldi.tids, atid);
+
+	spin_lock_bh(&csk->lock);
+
+	if (unlikely(csk->state != CXGB4I_CSK_ST_CONNECTING))
+		cxgb4i_log_error("TID %u expected SYN_SENT, got EST., s %u\n",
+				csk->hwtid, csk->state);
+
+	csk->copied_seq = csk->rcv_wup = csk->rcv_nxt = rcv_isn;
+	cxgb4i_sock_established(csk, ntohl(req->snd_isn), ntohs(req->tcp_opt));
+
+	__kfree_skb(skb);
+
+	if (unlikely(cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_ACTIVE_CLOSE_NEEDED)))
+		cxgb4i_sock_send_abort_req(csk);
+	else {
+		if (skb_queue_len(&csk->write_queue))
+			cxgb4i_sock_push_tx_frames(csk, 1);
+
+		cxgbi_conn_tx_open(csk);
+	}
+
+	spin_unlock_bh(&csk->lock);
+
+	return 0;
+}
+
+static int act_open_rpl_status_to_errno(int status)
+{
+	switch (status) {
+	case CPL_ERR_CONN_RESET:
+		return -ECONNREFUSED;
+	case CPL_ERR_ARP_MISS:
+		return -EHOSTUNREACH;
+	case CPL_ERR_CONN_TIMEDOUT:
+		return -ETIMEDOUT;
+	case CPL_ERR_TCAM_FULL:
+		return -ENOMEM;
+	case CPL_ERR_CONN_EXIST:
+		cxgb4i_log_error("ACTIVE_OPEN_RPL: 4-tuple in use\n");
+		return -EADDRINUSE;
+	default:
+		return -EIO;
+	}
+}
+
+/*
+ * Return whether a failed active open has allocated a TID
+ */
+static inline int act_open_has_tid(int status)
+{
+	return status != CPL_ERR_TCAM_FULL && status != CPL_ERR_CONN_EXIST &&
+		status != CPL_ERR_ARP_MISS;
+}
+
+static void cxgb4i_sock_act_open_retry_timer(unsigned long data)
+{
+	struct sk_buff *skb;
+	struct cxgb4i_sock *csk = (struct cxgb4i_sock *)data;
+
+	cxgb4i_conn_debug("csk 0x%p, state %u.\n", csk, csk->state);
+
+	spin_lock_bh(&csk->lock);
+	skb = alloc_skb(sizeof(struct cpl_act_open_req), GFP_ATOMIC);
+	if (!skb)
+		cxgb4i_fail_act_open(csk, -ENOMEM);
+	else {
+		unsigned int qid_atid  = csk->rss_qid << 14;
+		qid_atid |= (unsigned int)csk->atid;
+		skb->sk = (struct sock *)csk;
+		t4_set_arp_err_handler(skb, csk,
+					cxgb4i_act_open_req_arp_failure);
+		cxgb4i_sock_make_act_open_req(csk, skb, qid_atid, csk->l2t);
+		cxgb4_l2t_send(csk->snic->lldi.ports[csk->port_id], skb,
+								csk->l2t);
+	}
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+}
+
+static int cxgb4i_cpl_act_open_rpl(struct cxgb4i_snic *snic,
+						struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_act_open_rpl *rpl = cplhdr(skb);
+	unsigned int atid =
+		GET_TID_TID(GET_AOPEN_ATID(be32_to_cpu(rpl->atid_status)));
+	struct tid_info *t = snic->lldi.tids;
+	unsigned int status = GET_AOPEN_STATUS(be32_to_cpu(rpl->atid_status));
+
+	csk = lookup_atid(t, atid);
+
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", atid);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+
+	cxgb4i_conn_debug("rcv, status 0x%x, csk 0x%p, csk->state %u, "
+			"csk->flag 0x%lx, csk->atid %u.\n",
+			status, csk, csk->state, csk->flags, csk->hwtid);
+
+	if (status & act_open_has_tid(status))
+		cxgb4_remove_tid(snic->lldi.tids, csk->port_id, GET_TID(rpl));
+
+	if (status == CPL_ERR_CONN_EXIST &&
+			csk->retry_timer.function !=
+			cxgb4i_sock_act_open_retry_timer) {
+		csk->retry_timer.function = cxgb4i_sock_act_open_retry_timer;
+		if (!mod_timer(&csk->retry_timer, jiffies + HZ / 2))
+			cxgb4i_sock_hold(csk);
+	} else
+
+		cxgb4i_fail_act_open(csk, act_open_rpl_status_to_errno(status));
+
+	__kfree_skb(skb);
+
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+
+	return 0;
+}
+
+static int cxgb4i_cpl_peer_close(struct cxgb4i_snic *snic, struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_peer_close *req = cplhdr(skb);
+	unsigned int hwtid = GET_TID(req);
+	struct tid_info *t = snic->lldi.tids;
+
+	csk = lookup_tid(t, hwtid);
+
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+
+	if (cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_ABORT_RPL_PENDING))
+		goto out;
+
+	switch (csk->state) {
+	case CXGB4I_CSK_ST_ESTABLISHED:
+		cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_PASSIVE_CLOSE);
+		break;
+	case CXGB4I_CSK_ST_ACTIVE_CLOSE:
+		cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_CLOSE_WAIT_2);
+		break;
+	case CXGB4I_CSK_ST_CLOSE_WAIT_1:
+		cxgb4i_sock_closed(csk);
+		break;
+	case CXGB4I_CSK_ST_ABORTING:
+		break;
+	default:
+		cxgb4i_log_error("peer close, TID %u in bad state %u\n",
+				csk->hwtid, csk->state);
+	}
+
+	cxgb4i_sock_conn_closing(csk);
+
+out:
+	__kfree_skb(skb);
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+
+	return 0;
+}
+
+static int cxgb4i_cpl_close_con_rpl(struct cxgb4i_snic *snic,
+						struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_close_con_rpl *rpl = cplhdr(skb);
+	unsigned int hwtid = GET_TID(rpl);
+	struct tid_info *t = snic->lldi.tids;
+
+	csk = lookup_tid(t, hwtid);
+
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+
+	cxgb4i_conn_debug("csk 0x%p, state %u, flag 0x%lx.\n",
+			csk, csk->state, csk->flags);
+
+	csk->snd_una = ntohl(rpl->snd_nxt) - 1;
+
+	if (cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_ABORT_RPL_PENDING))
+		goto out;
+
+	switch (csk->state) {
+	case CXGB4I_CSK_ST_ACTIVE_CLOSE:
+		cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_CLOSE_WAIT_1);
+		break;
+	case CXGB4I_CSK_ST_CLOSE_WAIT_1:
+	case CXGB4I_CSK_ST_CLOSE_WAIT_2:
+		cxgb4i_sock_closed(csk);
+		break;
+	case CXGB4I_CSK_ST_ABORTING:
+		break;
+	default:
+		cxgb4i_log_error("close_rpl, TID %u in bad state %u\n",
+				csk->hwtid, csk->state);
+	}
+out:
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+	kfree_skb(skb);
+
+	return 0;
+}
+
+static int abort_status_to_errno(struct cxgb4i_sock *csk, int abort_reason,
+								int *need_rst)
+{
+	switch (abort_reason) {
+	case CPL_ERR_BAD_SYN: /* fall through */
+	case CPL_ERR_CONN_RESET:
+		return csk->state > CXGB4I_CSK_ST_ESTABLISHED ?
+			-EPIPE : -ECONNRESET;
+	case CPL_ERR_XMIT_TIMEDOUT:
+	case CPL_ERR_PERSIST_TIMEDOUT:
+	case CPL_ERR_FINWAIT2_TIMEDOUT:
+	case CPL_ERR_KEEPALIVE_TIMEDOUT:
+		return -ETIMEDOUT;
+	default:
+		return -EIO;
+	}
+}
+
+/*
+ * Returns whether an ABORT_REQ_RSS message is a negative advice.
+ */
+static inline int is_neg_adv_abort(unsigned int status)
+{
+	return status == CPL_ERR_RTX_NEG_ADVICE ||
+		status == CPL_ERR_PERSIST_NEG_ADVICE;
+}
+
+static int cxgb4i_cpl_abort_req_rss(struct cxgb4i_snic *snic,
+						struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_abort_req_rss *req = cplhdr(skb);
+	unsigned int hwtid = GET_TID(req);
+	struct tid_info *t = snic->lldi.tids;
+	int rst_status = CPL_ABORT_NO_RST;
+
+	csk = lookup_tid(t, hwtid);
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+
+	if (is_neg_adv_abort(req->status)) {
+		__kfree_skb(skb);
+		return 0;
+	}
+
+	if (!cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_ABORT_REQ_RCVD)) {
+		cxgb4i_sock_set_flag(csk, CXGB4I_CSK_FL_ABORT_REQ_RCVD);
+		cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_ABORTING);
+		__kfree_skb(skb);
+		return 0;
+	}
+
+	cxgb4i_sock_clear_flag(csk, CXGB4I_CSK_FL_ABORT_REQ_RCVD);
+	cxgb4i_sock_send_abort_rpl(csk, rst_status);
+
+	if (!cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_ABORT_RPL_PENDING)) {
+		csk->err = abort_status_to_errno(csk, req->status,
+							&rst_status);
+		cxgb4i_sock_closed(csk);
+	}
+
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+
+	return 0;
+}
+
+static int cxgb4i_cpl_abort_rpl_rss(struct cxgb4i_snic *snic,
+						struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_abort_rpl_rss *rpl = cplhdr(skb);
+	unsigned int hwtid = GET_TID(rpl);
+	struct tid_info *t = snic->lldi.tids;
+
+	if (rpl->status == CPL_ERR_ABORT_FAILED)
+		goto out;
+
+	csk = lookup_tid(t, hwtid);
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		goto out;
+	}
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+
+	if (cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_ABORT_RPL_PENDING)) {
+		if (!cxgb4i_sock_flag(csk, CXGB4I_CSK_FL_ABORT_RPL_RCVD))
+			cxgb4i_sock_set_flag(csk,
+						CXGB4I_CSK_FL_ABORT_RPL_RCVD);
+		else {
+			cxgb4i_sock_clear_flag(csk,
+						CXGB4I_CSK_FL_ABORT_RPL_RCVD);
+			cxgb4i_sock_clear_flag(csk,
+					CXGB4I_CSK_FL_ABORT_RPL_PENDING);
+
+			if (cxgb4i_sock_flag(csk,
+						CXGB4I_CSK_FL_ABORT_REQ_RCVD))
+				cxgb4i_log_error("tid %u, ABORT_RPL_RSS\n",
+						csk->hwtid);
+
+			cxgb4i_sock_closed(csk);
+		}
+	}
+
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+
+out:
+	__kfree_skb(skb);
+	return 0;
+}
+
+static int cxgb4i_cpl_iscsi_hdr(struct cxgb4i_snic *snic, struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_iscsi_hdr *cpl = cplhdr(skb);
+	unsigned int hwtid = GET_TID(cpl);
+	struct tid_info *t = snic->lldi.tids;
+	struct sk_buff *lskb;
+
+	csk = lookup_tid(t, hwtid);
+
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	spin_lock_bh(&csk->lock);
+
+	if (unlikely(csk->state >= CXGB4I_CSK_ST_PASSIVE_CLOSE)) {
+		if (csk->state != CXGB4I_CSK_ST_ABORTING)
+			goto abort_conn;
+	}
+
+	cxgb4i_skb_tcp_seq(skb) = ntohl(cpl->seq);
+	cxgb4i_skb_flags(skb) = 0;
+
+	skb_reset_transport_header(skb);
+	__skb_pull(skb, sizeof(*cpl));
+	__pskb_trim(skb, ntohs(cpl->len));
+
+	if (!csk->skb_ulp_lhdr) {
+		unsigned char *byte;
+		csk->skb_ulp_lhdr = skb;
+		lskb = csk->skb_ulp_lhdr;
+
+		cxgb4i_skb_flags(lskb) = CXGB4I_SKCB_FLAG_HDR_RCVD;
+
+		if (cxgb4i_skb_tcp_seq(lskb) != csk->rcv_nxt) {
+			cxgb4i_log_error("tid 0x%x, CPL_ISCSI_HDR, bad seq got "
+					"0x%x, exp 0x%x\n",
+					csk->hwtid,
+					cxgb4i_skb_tcp_seq(lskb),
+					csk->rcv_nxt);
+		}
+
+		byte = skb->data;
+		cxgb4i_skb_rx_pdulen(skb) = ntohs(cpl->pdu_len_ddp) - 40;
+		csk->rcv_nxt += cxgb4i_skb_rx_pdulen(lskb);
+	} else {
+		lskb = csk->skb_ulp_lhdr;
+		cxgb4i_skb_flags(lskb) |= CXGB4I_SKCB_FLAG_DATA_RCVD;
+		cxgb4i_skb_flags(skb) = CXGB4I_SKCB_FLAG_DATA_RCVD;
+		cxgb4i_log_debug("csk 0x%p, tid 0x%x skb 0x%p, pdu data, "
+				" header 0x%p.\n",
+				csk, csk->hwtid, skb, lskb);
+	}
+
+	__skb_queue_tail(&csk->receive_queue, skb);
+
+	spin_unlock_bh(&csk->lock);
+
+	return 0;
+
+abort_conn:
+	cxgb4i_sock_send_abort_req(csk);
+	__kfree_skb(skb);
+	spin_unlock_bh(&csk->lock);
+
+	return -EINVAL;
+}
+
+static int cxgb4i_cpl_rx_data_ddp(struct cxgb4i_snic *snic, struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct sk_buff *lskb;
+	struct cpl_rx_data_ddp *rpl = cplhdr(skb);
+	unsigned int hwtid = GET_TID(rpl);
+	struct tid_info *t = snic->lldi.tids;
+	unsigned int status;
+
+	csk = lookup_tid(t, hwtid);
+
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	spin_lock_bh(&csk->lock);
+
+	if (unlikely(csk->state >= CXGB4I_CSK_ST_PASSIVE_CLOSE)) {
+		if (csk->state != CXGB4I_CSK_ST_ABORTING)
+			goto abort_conn;
+	}
+
+	if (!csk->skb_ulp_lhdr) {
+		cxgb4i_log_error("tid 0x%x, rcv RX_DATA_DDP w/o pdu header\n",
+				csk->hwtid);
+		goto abort_conn;
+	}
+
+	lskb = csk->skb_ulp_lhdr;
+	cxgb4i_skb_flags(lskb) |= CXGB4I_SKCB_FLAG_STATUS_RCVD;
+
+	if (ntohs(rpl->len) != cxgb4i_skb_rx_pdulen(lskb)) {
+		cxgb4i_log_error("tid 0x%x, RX_DATA_DDP pdulen %u != %u.\n",
+				csk->hwtid, ntohs(rpl->len),
+				cxgb4i_skb_rx_pdulen(lskb));
+	}
+
+	cxgb4i_skb_rx_ddigest(lskb) = ntohl(rpl->ulp_crc);
+	status = ntohl(rpl->ddpvld);
+
+	if (status & (1 << RX_DDP_STATUS_HCRC_SHIFT))
+		cxgb4i_skb_ulp_mode(skb) |= ULP2_FLAG_HCRC_ERROR;
+	if (status & (1 << RX_DDP_STATUS_DCRC_SHIFT))
+		cxgb4i_skb_ulp_mode(skb) |= ULP2_FLAG_DCRC_ERROR;
+	if (status & (1 << RX_DDP_STATUS_PAD_SHIFT))
+		cxgb4i_skb_ulp_mode(skb) |= ULP2_FLAG_PAD_ERROR;
+	if ((cxgb4i_skb_flags(lskb) & ULP2_FLAG_DATA_READY))
+		cxgb4i_skb_ulp_mode(skb) |= ULP2_FLAG_DATA_DDPED;
+
+	csk->skb_ulp_lhdr = NULL;
+
+	__kfree_skb(skb);
+	cxgbi_conn_pdu_ready(csk);
+	spin_unlock_bh(&csk->lock);
+
+	return 0;
+
+abort_conn:
+	cxgb4i_sock_send_abort_req(csk);
+	__kfree_skb(skb);
+	spin_unlock_bh(&csk->lock);
+	return -EINVAL;
+}
+
+static void check_wr_invariants(const struct cxgb4i_sock *csk)
+{
+	int pending = cxgb4i_sock_count_pending_wrs(csk);
+
+	if (unlikely(csk->wr_cred + pending != csk->wr_max_cred))
+		printk(KERN_ERR "TID %u: credit imbalance: avail %u, "
+				"pending %u, total should be %u\n",
+				csk->hwtid,
+				csk->wr_cred,
+				pending,
+				csk->wr_max_cred);
+}
+
+static int cxgb4i_cpl_fw4_ack(struct cxgb4i_snic *snic, struct sk_buff *skb)
+{
+	struct cxgb4i_sock *csk;
+	struct cpl_fw4_ack *rpl = cplhdr(skb);
+	unsigned int hwtid = GET_TID(rpl);
+	struct tid_info *t = snic->lldi.tids;
+	unsigned char credits;
+	unsigned int snd_una;
+
+	csk = lookup_tid(t, hwtid);
+	if (unlikely(!csk)) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		kfree_skb(skb);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+
+	credits = rpl->credits;
+	snd_una = be32_to_cpu(rpl->snd_una);
+
+	cxgb4i_tx_debug("%u WR credits, avail %u, unack %u, TID %u, state %u\n",
+				credits, csk->wr_cred, csk->wr_una_cred,
+						csk->hwtid, csk->state);
+
+	csk->wr_cred += credits;
+
+	if (csk->wr_una_cred > csk->wr_max_cred - csk->wr_cred)
+		csk->wr_una_cred = csk->wr_max_cred - csk->wr_cred;
+
+	while (credits) {
+		struct sk_buff *p = cxgb4i_sock_peek_wr(csk);
+
+		if (unlikely(!p)) {
+			cxgb4i_log_error("%u WR_ACK credits for TID %u with "
+					"nothing pending, state %u\n",
+					credits, csk->hwtid, csk->state);
+			break;
+		}
+
+		if (unlikely(credits < p->csum)) {
+			p->csum -= credits;
+		} else {
+			cxgb4i_sock_dequeue_wr(csk);
+			credits -= p->csum;
+			cxgb4i_sock_free_wr_skb(p);
+		}
+	}
+
+	check_wr_invariants(csk);
+
+	if (rpl->seq_vld) {
+		if (unlikely(before(snd_una, csk->snd_una))) {
+			cxgb4i_log_error("TID %u, unexpected sequence # %u "
+					"in WR_ACK snd_una %u\n",
+					csk->hwtid, snd_una, csk->snd_una);
+			goto out_free;
+		}
+	}
+
+	if (csk->snd_una != snd_una) {
+		csk->snd_una = snd_una;
+		dst_confirm(csk->dst);
+	}
+
+	if (skb_queue_len(&csk->write_queue)) {
+		if (cxgb4i_sock_push_tx_frames(csk, 0))
+			cxgbi_conn_tx_open(csk);
+	} else
+		cxgbi_conn_tx_open(csk);
+
+	goto out;
+
+out_free:
+
+	__kfree_skb(skb);
+
+out:
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+
+	return 0;
+}
+
+static int cxgb4i_cpl_set_tcb_rpl(struct cxgb4i_snic *snic, struct sk_buff *skb)
+{
+	struct cpl_set_tcb_rpl *rpl = cplhdr(skb);
+	unsigned int hwtid = GET_TID(rpl);
+	struct tid_info *t = snic->lldi.tids;
+	struct cxgb4i_sock *csk;
+
+	csk = lookup_tid(t, hwtid);
+
+	if (!csk) {
+		cxgb4i_log_error("can't find connection for tid %u\n", hwtid);
+		__kfree_skb(skb);
+		return CPL_RET_UNKNOWN_TID;
+	}
+
+	spin_lock_bh(&csk->lock);
+
+	if (rpl->status != CPL_ERR_NONE) {
+		cxgb4i_log_error("Unexpected SET_TCB_RPL status %u "
+				 "for tid %u\n", rpl->status, GET_TID(rpl));
+	}
+
+	__kfree_skb(skb);
+	spin_unlock_bh(&csk->lock);
+
+	return 0;
+}
+
+static void cxgb4i_sock_free_cpl_skbs(struct cxgb4i_sock *csk)
+{
+	if (csk->cpl_close)
+		kfree_skb(csk->cpl_close);
+	if (csk->cpl_abort_req)
+		kfree_skb(csk->cpl_abort_req);
+	if (csk->cpl_abort_rpl)
+		kfree_skb(csk->cpl_abort_rpl);
+}
+
+static int cxgb4i_alloc_cpl_skbs(struct cxgb4i_sock *csk)
+{
+	csk->cpl_close = alloc_skb(sizeof(struct cpl_close_con_req),
+					GFP_KERNEL);
+	if (!csk->cpl_close)
+		return -ENOMEM;
+	skb_put(csk->cpl_close, sizeof(struct cpl_close_con_req));
+
+	csk->cpl_abort_req = alloc_skb(sizeof(struct cpl_abort_req),
+					GFP_KERNEL);
+	if (!csk->cpl_abort_req)
+		goto free_cpl_skbs;
+	skb_put(csk->cpl_abort_req, sizeof(struct cpl_abort_req));
+
+	csk->cpl_abort_rpl = alloc_skb(sizeof(struct cpl_abort_rpl),
+					GFP_KERNEL);
+	if (!csk->cpl_abort_rpl)
+		goto free_cpl_skbs;
+	skb_put(csk->cpl_abort_rpl, sizeof(struct cpl_abort_rpl));
+
+	return 0;
+
+free_cpl_skbs:
+	cxgb4i_sock_free_cpl_skbs(csk);
+	return -ENOMEM;
+}
+
+static void cxgb4i_sock_release_offload_resources(struct cxgb4i_sock *csk)
+{
+
+	cxgb4i_sock_free_cpl_skbs(csk);
+
+	if (csk->wr_cred != csk->wr_max_cred) {
+		cxgb4i_sock_purge_wr_queue(csk);
+		cxgb4i_sock_reset_wr_list(csk);
+	}
+
+	if (csk->l2t) {
+		cxgb4_l2t_release(csk->l2t);
+		csk->l2t = NULL;
+	}
+
+	if (csk->state == CXGB4I_CSK_ST_CONNECTING)
+		cxgb4i_sock_free_atid(csk);
+	else {
+		cxgb4_remove_tid(csk->snic->lldi.tids, 0, csk->hwtid);
+		cxgb4i_sock_put(csk);
+	}
+
+	csk->dst = NULL;
+	csk->snic = NULL;
+}
+
+struct cxgb4i_sock *cxgb4i_sock_create(struct cxgb4i_snic *snic)
+{
+	struct cxgb4i_sock *csk = NULL;
+
+	csk = kzalloc(sizeof(*csk), GFP_KERNEL);
+	if (!csk)
+		return NULL;
+
+	if (cxgb4i_alloc_cpl_skbs(csk) < 0)
+		goto free_csk;
+
+	cxgb4i_conn_debug("alloc csk: 0x%p\n", csk);
+
+	csk->flags = 0;
+	spin_lock_init(&csk->lock);
+	atomic_set(&csk->refcnt, 1);
+	skb_queue_head_init(&csk->receive_queue);
+	skb_queue_head_init(&csk->write_queue);
+	setup_timer(&csk->retry_timer, NULL, (unsigned long)csk);
+	rwlock_init(&csk->callback_lock);
+	csk->snic = snic;
+
+	return csk;
+
+free_csk:
+	cxgb4i_api_debug("csk alloc failed %p, baling out\n", csk);
+	kfree(csk);
+	return NULL;
+}
+
+static void cxgb4i_sock_active_close(struct cxgb4i_sock *csk)
+{
+	int data_lost;
+	int close_req = 0;
+
+	cxgb4i_conn_debug("csk 0x%p, state %u, flags %lu\n",
+			csk, csk->state, csk->flags);
+
+	dst_confirm(csk->dst);
+
+	cxgb4i_sock_hold(csk);
+	spin_lock_bh(&csk->lock);
+
+	data_lost = skb_queue_len(&csk->receive_queue);
+	__skb_queue_purge(&csk->receive_queue);
+
+	switch (csk->state) {
+	case CXGB4I_CSK_ST_CLOSED:
+	case CXGB4I_CSK_ST_ACTIVE_CLOSE:
+	case CXGB4I_CSK_ST_CLOSE_WAIT_1:
+	case CXGB4I_CSK_ST_CLOSE_WAIT_2:
+	case CXGB4I_CSK_ST_ABORTING:
+		break;
+
+	case CXGB4I_CSK_ST_CONNECTING:
+		cxgb4i_sock_set_flag(csk, CXGB4I_CSK_FL_ACTIVE_CLOSE_NEEDED);
+		break;
+	case CXGB4I_CSK_ST_ESTABLISHED:
+		close_req = 1;
+		cxgb4i_sock_set_flag(csk, CXGB4I_CSK_ST_CLOSE_WAIT_2);
+		break;
+	}
+
+	if (close_req) {
+		if (data_lost)
+			cxgb4i_sock_send_abort_req(csk);
+		else
+			cxgb4i_sock_send_close_req(csk);
+	}
+
+	spin_unlock_bh(&csk->lock);
+	cxgb4i_sock_put(csk);
+}
+
+void cxgb4i_sock_release(struct cxgb4i_sock *csk)
+{
+	cxgb4i_conn_debug("csk 0x%p, state %u, flags %lu\n",
+			csk, csk->state, csk->flags);
+
+	if (unlikely(csk->state == CXGB4I_CSK_ST_CONNECTING))
+		cxgb4i_sock_set_state(csk,
+				CXGB4I_CSK_FL_ACTIVE_CLOSE_NEEDED);
+	else if (likely(csk->state != CXGB4I_CSK_ST_CLOSED))
+		cxgb4i_sock_active_close(csk);
+
+	cxgb4i_sock_put(csk);
+}
+
+static int is_cxgb4_dev(struct net_device *dev, struct cxgb4i_snic *snic)
+{
+	struct net_device *ndev = dev;
+	int i;
+
+	if (dev->priv_flags & IFF_802_1Q_VLAN)
+		ndev = vlan_dev_real_dev(dev);
+
+	for (i = 0; i < snic->lldi.nports; i++) {
+		if (ndev == snic->lldi.ports[i])
+			return 1;
+	}
+
+	return 0;
+}
+
+static struct net_device *cxgb4i_find_egress_dev(struct net_device *root_dev,
+						struct cxgb4i_snic *snic)
+{
+	while (root_dev) {
+		if (root_dev->priv_flags & IFF_802_1Q_VLAN)
+			root_dev = vlan_dev_real_dev(root_dev);
+		else if (is_cxgb4_dev(root_dev, snic))
+			return root_dev;
+		else
+			return NULL;
+	}
+
+	return NULL;
+}
+
+static struct rtable *find_route(struct net_device *dev,
+				__be32 saddr, __be32 daddr,
+				__be16 sport, __be16 dport,
+				u8 tos)
+{
+	struct rtable *rt;
+	struct flowi fl = {
+		.oif = dev ? dev->ifindex : 0,
+		.nl_u = {
+			.ip4_u = {
+				.daddr = daddr,
+				.saddr = saddr,
+				.tos = tos }
+			},
+		.proto = IPPROTO_TCP,
+		.uli_u = {
+			.ports = {
+				.sport = sport,
+				.dport = dport }
+			}
+	};
+
+	if (ip_route_output_flow(dev ? dev_net(dev) : &init_net,
+					&rt, &fl, NULL, 0))
+		return NULL;
+
+	return rt;
+}
+
+static int cxgb4i_init_act_open(struct cxgb4i_sock *csk,
+					struct net_device *dev)
+{
+	struct dst_entry *dst = csk->dst;
+	struct sk_buff *skb;
+	struct port_info *pi = netdev_priv(dev);
+
+	cxgb4i_conn_debug("csk 0x%p, state %u, flags 0x%lx\n",
+			csk, csk->state, csk->flags);
+
+	csk->atid = cxgb4_alloc_atid(csk->snic->lldi.tids, csk);
+	if (csk->atid == -1) {
+		cxgb4i_log_error("cannot alloc atid\n");
+		goto out_err;
+	}
+
+	csk->l2t = cxgb4_l2t_get(csk->snic->lldi.l2t,
+				csk->dst->neighbour,
+				dev, 0);
+	if (!csk->l2t) {
+		cxgb4i_log_error("cannot alloc l2t\n");
+		goto free_atid;
+	}
+
+	skb = alloc_skb(sizeof(struct cpl_act_open_req), GFP_KERNEL);
+	if (!skb)
+		goto free_l2t;
+
+	skb->sk = (struct sock *)csk;
+	t4_set_arp_err_handler(skb, csk, cxgb4i_act_open_req_arp_failure);
+
+	cxgb4i_sock_hold(csk);
+
+	csk->wr_max_cred = csk->wr_cred = csk->snic->lldi.wr_cred;
+	csk->port_id = pi->port_id;
+	csk->rss_qid = csk->snic->lldi.rxq_ids[csk->port_id];
+	csk->tx_chan = pi->tx_chan;
+	csk->smac_idx = csk->tx_chan << 1;
+	csk->wr_una_cred = 0;
+	csk->mss_idx = cxgb4i_select_mss(csk, dst_mtu(dst));
+	csk->err = 0;
+
+	cxgb4i_sock_reset_wr_list(csk);
+
+	cxgb4i_sock_make_act_open_req(csk, skb,
+					((csk->rss_qid << 14) |
+					 (csk->atid)), csk->l2t);
+	cxgb4_l2t_send(csk->snic->lldi.ports[csk->port_id], skb, csk->l2t);
+	return 0;
+
+free_l2t:
+	cxgb4_l2t_release(csk->l2t);
+
+free_atid:
+	cxgb4_free_atid(csk->snic->lldi.tids, csk->atid);
+
+out_err:
+
+	return -EINVAL;;
+}
+
+static struct net_device *cxgb4i_find_dev(struct net_device *dev,
+							__be32 ipaddr)
+{
+	struct flowi fl;
+	struct rtable *rt;
+	int err;
+
+	memset(&fl, 0, sizeof(fl));
+	fl.nl_u.ip4_u.daddr = ipaddr;
+
+	err = ip_route_output_key(dev ? dev_net(dev) : &init_net, &rt, &fl);
+	if (!err)
+		return (&rt->u.dst)->dev;
+
+	return NULL;
+}
+
+int cxgb4i_sock_connect(struct net_device *dev, struct cxgb4i_sock *csk,
+						struct sockaddr_in *sin)
+{
+	struct rtable *rt;
+	__be32 sipv4 = 0;
+	struct net_device *dstdev;
+	struct cxgb4i_hba *chba = NULL;
+	int err;
+
+	cxgb4i_conn_debug("csk 0x%p, dev 0x%p\n", csk, dev);
+
+	if (sin->sin_family != AF_INET)
+		return -EAFNOSUPPORT;
+
+	csk->daddr.sin_port = sin->sin_port;
+	csk->daddr.sin_addr.s_addr = sin->sin_addr.s_addr;
+
+	dstdev = cxgb4i_find_dev(dev, sin->sin_addr.s_addr);
+	if (!dstdev || !is_cxgb4_dev(dstdev, csk->snic))
+		return -ENETUNREACH;
+
+	if (dstdev->priv_flags & IFF_802_1Q_VLAN)
+		dev = dstdev;
+
+	rt = find_route(dev, csk->saddr.sin_addr.s_addr,
+			csk->daddr.sin_addr.s_addr,
+			csk->saddr.sin_port,
+			csk->daddr.sin_port,
+			0);
+	if (rt == NULL) {
+		cxgb4i_conn_debug("no route to %pI4, port %u, dev %s, "
+					"snic 0x%p\n",
+					&csk->daddr.sin_addr.s_addr,
+					ntohs(csk->daddr.sin_port),
+					dev ? dev->name : "any",
+					csk->snic);
+		return -ENETUNREACH;
+	}
+
+	if (rt->rt_flags & (RTCF_MULTICAST | RTCF_BROADCAST)) {
+		cxgb4i_conn_debug("multi-cast route to %pI4, port %u, "
+					"dev %s, snic 0x%p\n",
+					&csk->daddr.sin_addr.s_addr,
+					ntohs(csk->daddr.sin_port),
+					dev ? dev->name : "any",
+					csk->snic);
+		ip_rt_put(rt);
+		return -ENETUNREACH;
+	}
+
+	if (!csk->saddr.sin_addr.s_addr)
+		csk->saddr.sin_addr.s_addr = rt->rt_src;
+
+	csk->dst = &rt->u.dst;
+
+	dev = cxgb4i_find_egress_dev(csk->dst->dev, csk->snic);
+	if (dev == NULL) {
+		cxgb4i_conn_debug("csk: 0x%p, egress dev NULL\n", csk);
+		return -ENETUNREACH;
+	}
+
+	err = cxgb4i_sock_get_port(csk);
+	if (err)
+		return err;
+
+	cxgb4i_conn_debug("csk: 0x%p get port: %u\n",
+			csk, ntohs(csk->saddr.sin_port));
+
+	chba = cxgb4i_hba_find_by_netdev(csk->dst->dev);
+
+	sipv4 = cxgb4i_get_iscsi_ipv4(chba);
+	if (!sipv4) {
+		cxgb4i_conn_debug("csk: 0x%p, iscsi is not configured\n", csk);
+		sipv4 = csk->saddr.sin_addr.s_addr;
+		cxgb4i_set_iscsi_ipv4(chba, sipv4);
+	} else
+		csk->saddr.sin_addr.s_addr = sipv4;
+
+	cxgb4i_conn_debug("csk: 0x%p, %pI4:[%u], %pI4:[%u] SYN_SENT\n",
+				csk,
+				&csk->saddr.sin_addr.s_addr,
+				ntohs(csk->saddr.sin_port),
+				&csk->daddr.sin_addr.s_addr,
+				ntohs(csk->daddr.sin_port));
+
+	cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_CONNECTING);
+
+	if (!cxgb4i_init_act_open(csk, dev))
+		return 0;
+
+	err = -ENOTSUPP;
+
+	cxgb4i_conn_debug("csk 0x%p -> closed\n", csk);
+	cxgb4i_sock_set_state(csk, CXGB4I_CSK_ST_CLOSED);
+	ip_rt_put(rt);
+	cxgb4i_sock_put_port(csk);
+
+	return err;
+}
+
+void cxgb4i_sock_rx_credits(struct cxgb4i_sock *csk, int copied)
+{
+	int must_send;
+	u32 credits;
+
+	if (csk->state != CXGB4I_CSK_ST_ESTABLISHED)
+		return;
+
+	credits = csk->copied_seq - csk->rcv_wup;
+	if (unlikely(!credits))
+		return;
+
+	if (unlikely(cxgb4i_rx_credit_thres == 0))
+		return;
+
+	must_send = credits + 16384 >= cxgb4i_rcv_win;
+
+	if (must_send || credits >= cxgb4i_rx_credit_thres)
+		csk->rcv_wup += cxgb4i_csk_send_rx_credits(csk, credits);
+}
+
+int cxgb4i_sock_send_pdus(struct cxgb4i_sock *csk, struct sk_buff *skb)
+{
+	struct sk_buff *next;
+	int err, copied = 0;
+
+	spin_lock_bh(&csk->lock);
+
+	if (csk->state != CXGB4I_CSK_ST_ESTABLISHED) {
+		cxgb4i_tx_debug("csk 0x%p, not in est. state %u.\n",
+			      csk, csk->state);
+		err = -EAGAIN;
+		goto out_err;
+	}
+
+	if (csk->err) {
+		cxgb4i_tx_debug("csk 0x%p, err %d.\n", csk, csk->err);
+		err = -EPIPE;
+		goto out_err;
+	}
+
+	if (csk->write_seq - csk->snd_una >= cxgb4i_snd_win) {
+		cxgb4i_tx_debug("csk 0x%p, snd %u - %u > %u.\n",
+				csk, csk->write_seq, csk->snd_una,
+				cxgb4i_snd_win);
+		err = -ENOBUFS;
+		goto out_err;
+	}
+
+	while (skb) {
+		int frags = skb_shinfo(skb)->nr_frags +
+				(skb->len != skb->data_len);
+
+		if (unlikely(skb_headroom(skb) < TX_HEADER_LEN)) {
+			cxgb4i_tx_debug("csk 0x%p, skb head.\n", csk);
+			err = -EINVAL;
+			goto out_err;
+		}
+
+		if (frags >= SKB_WR_LIST_SIZE) {
+			cxgb4i_log_error("csk 0x%p, tx frags %d, len %u,%u.\n",
+					 csk, skb_shinfo(skb)->nr_frags,
+					 skb->len, skb->data_len);
+			err = -EINVAL;
+			goto out_err;
+		}
+
+		next = skb->next;
+		skb->next = NULL;
+		cxgb4i_sock_skb_entail(csk, skb,
+				CXGB4I_SKCB_FLAG_NO_APPEND |
+				CXGB4I_SKCB_FLAG_NEED_HDR);
+		copied += skb->len;
+		csk->write_seq += skb->len + ulp_extra_len(skb);
+		skb = next;
+	}
+done:
+	if (likely(skb_queue_len(&csk->write_queue)))
+		cxgb4i_sock_push_tx_frames(csk, 1);
+	spin_unlock_bh(&csk->lock);
+	return copied;
+
+out_err:
+	if (copied == 0 && err == -EPIPE)
+		copied = csk->err ? csk->err : -EPIPE;
+	else
+		copied = err;
+	goto done;
+}
+
+static void cxgb4i_sock_conn_closing(struct cxgb4i_sock *csk)
+{
+	struct iscsi_conn *conn = csk->user_data;
+
+	read_lock(&csk->callback_lock);
+	if (conn && csk->state != CXGB4I_CSK_ST_ESTABLISHED)
+		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+	read_unlock(&csk->callback_lock);
+}
+
+int cxgb4i_ofld_init(struct cxgb4i_snic *snic)
+{
+	struct cxgb4i_ports_map *ports;
+	int mapsize = (cxgb4i_max_connect * sizeof(struct cxgb4i_sock));
+
+	ports = cxgb4i_alloc_big_mem(sizeof(*ports) + mapsize, GFP_KERNEL);
+	if (!ports)
+		return -ENOMEM;
+
+	spin_lock_init(&ports->lock);
+	snic->pmap = ports;
+
+	snic->funcs[CPL_ACT_ESTABLISH] = cxgb4i_cpl_act_establish;
+	snic->funcs[CPL_ACT_OPEN_RPL] = cxgb4i_cpl_act_open_rpl;
+	snic->funcs[CPL_PEER_CLOSE] = cxgb4i_cpl_peer_close;
+	snic->funcs[CPL_ABORT_REQ_RSS] = cxgb4i_cpl_abort_req_rss;
+	snic->funcs[CPL_ABORT_RPL_RSS] = cxgb4i_cpl_abort_rpl_rss;
+	snic->funcs[CPL_CLOSE_CON_RPL] = cxgb4i_cpl_close_con_rpl;
+	snic->funcs[CPL_FW4_ACK] = cxgb4i_cpl_fw4_ack;
+	snic->funcs[CPL_ISCSI_HDR] = cxgb4i_cpl_iscsi_hdr;
+	snic->funcs[CPL_SET_TCB_RPL] = cxgb4i_cpl_set_tcb_rpl;
+	snic->funcs[CPL_RX_DATA_DDP] = cxgb4i_cpl_rx_data_ddp;
+
+	return 0;
+}
+
+void cxgb4i_ofld_cleanup(struct cxgb4i_snic *snic)
+{
+	cxgb4i_free_big_mem(snic->pmap);
+}
+
diff --git a/drivers/scsi/cxgb4i/cxgb4i_offload.h b/drivers/scsi/cxgb4i/cxgb4i_offload.h
new file mode 100644
index 0000000..40ca066
--- /dev/null
+++ b/drivers/scsi/cxgb4i/cxgb4i_offload.h
@@ -0,0 +1,171 @@
+/*
+ * cxgb4i_offload.h: Chelsio T4 iSCSI driver.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#ifndef	__CXGB4I_OFFLOAD_H__
+#define	__CXGB4I_OFFLOAD_H__
+
+#include <linux/skbuff.h>
+#include <linux/in.h>
+
+#define	CXGB4I_MAX_CONN	0xFFF
+
+enum {
+	CPL_RET_BUF_DONE = 1,
+	CPL_RET_BAD_MSG = 2,
+	CPL_RET_UNKNOWN_TID = 4
+};
+
+struct cxgb4i_sock {
+	struct net_device *egdev;
+	struct cxgb4i_snic *snic;
+
+	unsigned long flags;
+	unsigned int qset;
+	unsigned int rss_qid;
+
+	unsigned int hwtid;
+	unsigned int atid;
+
+	unsigned int tx_chan;
+	unsigned int rx_chan;
+	unsigned int mss_idx;
+	unsigned int smac_idx;
+	unsigned char port_id;
+
+	struct l2t_entry *l2t;
+
+	int wr_max_cred;
+	int wr_cred;
+	int wr_una_cred;
+
+	struct sk_buff *wr_pending_head;
+	struct sk_buff *wr_pending_tail;
+	struct sk_buff *cpl_close;
+	struct sk_buff *cpl_abort_req;
+	struct sk_buff *cpl_abort_rpl;
+	struct sk_buff *skb_ulp_lhdr;
+	spinlock_t lock;
+	atomic_t refcnt;
+	volatile unsigned int state;
+	struct sockaddr_in saddr;
+	struct sockaddr_in daddr;
+	struct dst_entry *dst;
+	struct sk_buff_head receive_queue;
+	struct sk_buff_head write_queue;
+	struct timer_list retry_timer;
+	int err;
+	rwlock_t callback_lock;
+	void *user_data;
+
+	u32 rcv_nxt;
+	u32 copied_seq;
+	u32 rcv_wup;
+	u32 snd_nxt;
+	u32 snd_una;
+	u32 write_seq;
+};
+
+enum cxgb4i_sock_states{
+	CXGB4I_CSK_ST_CONNECTING = 1,
+	CXGB4I_CSK_ST_ESTABLISHED,
+	CXGB4I_CSK_ST_ACTIVE_CLOSE,
+	CXGB4I_CSK_ST_PASSIVE_CLOSE,
+	CXGB4I_CSK_ST_CLOSE_WAIT_1,
+	CXGB4I_CSK_ST_CLOSE_WAIT_2,
+	CXGB4I_CSK_ST_ABORTING,
+	CXGB4I_CSK_ST_CLOSED,
+};
+
+static inline unsigned int cxgb4i_sock_is_closing(
+			const struct cxgb4i_sock *csk)
+{
+	return csk->state >= CXGB4I_CSK_ST_ACTIVE_CLOSE;
+}
+
+static inline unsigned int cxgb4i_sock_is_established(
+				const struct cxgb4i_sock *csk)
+{
+	return csk->state == CXGB4I_CSK_ST_ESTABLISHED;
+}
+
+enum cxgb4i_sock_flags {
+	CXGB4I_CSK_FL_ABORT_RPL_RCVD,	/*received one ABORT_RPL_RSS message */
+	CXGB4I_CSK_FL_ABORT_REQ_RCVD,	/*received one ABORT_REQ_RSS message */
+	CXGB4I_CSK_FL_ABORT_RPL_PENDING,	/* expecting an abort reply */
+	CXGB4I_CSK_FL_TX_DATA_SENT,	/* already sent a TX_DATA WR */
+	CXGB4I_CSK_FL_ACTIVE_CLOSE_NEEDED,	/* need to be closed */
+	CXGB4I_CSK_FL_OFFLOAD_DOWN		/* offload function off */
+};
+
+struct cxgb4i_sock *cxgb4i_sock_create(struct cxgb4i_snic *);
+void cxgb4i_sock_release(struct cxgb4i_sock *);
+int cxgb4i_sock_connect(struct net_device *, struct cxgb4i_sock *,
+			struct sockaddr_in *);
+void cxgb4i_sock_rx_credits(struct cxgb4i_sock *, int);
+int cxgb4i_sock_send_pdus(struct cxgb4i_sock *, struct sk_buff *);
+
+struct cxgb4i_skb_rx_cb {
+	__u32 ddigest;
+	__u32 pdulen;
+};
+
+struct cxgb4i_skb_tx_cb {
+	struct l2t_skb_cb l2t;
+	struct sk_buff *wr_next;
+};
+
+struct cxgb4i_skb_cb {
+	__u16 flags;
+	__u16 ulp_mode;
+	__u32 seq;
+
+	union {
+		struct cxgb4i_skb_rx_cb rx;
+		struct cxgb4i_skb_tx_cb tx;
+	};
+};
+
+#define CXGB4I_SKB_CB(skb)	((struct cxgb4i_skb_cb *)&((skb)->cb[0]))
+#define cxgb4i_skb_flags(skb)	(CXGB4I_SKB_CB(skb)->flags)
+#define cxgb4i_skb_ulp_mode(skb)	(CXGB4I_SKB_CB(skb)->ulp_mode)
+#define cxgb4i_skb_tcp_seq(skb)		(CXGB4I_SKB_CB(skb)->seq)
+#define cxgb4i_skb_rx_ddigest(skb)	(CXGB4I_SKB_CB(skb)->rx.ddigest)
+#define cxgb4i_skb_rx_pdulen(skb)	(CXGB4I_SKB_CB(skb)->rx.pdulen)
+#define cxgb4i_skb_tx_wr_next(skb)	(CXGB4I_SKB_CB(skb)->tx.wr_next)
+
+enum cxgb4i_skcb_flags {
+	CXGB4I_SKCB_FLAG_NEED_HDR = 1 << 0,	/* packet needs a header */
+	CXGB4I_SKCB_FLAG_NO_APPEND = 1 << 1,	/* don't grow this skb */
+	CXGB4I_SKCB_FLAG_COMPL = 1 << 2,	/* request WR completion */
+	CXGB4I_SKCB_FLAG_HDR_RCVD = 1 << 3,	/* recieved header pdu */
+	CXGB4I_SKCB_FLAG_DATA_RCVD = 1 << 4,	/*  recieved data pdu */
+	CXGB4I_SKCB_FLAG_STATUS_RCVD = 1 << 5,	/* recieved ddp status */
+};
+
+/*
+ * sge_opaque_hdr -
+ * Opaque version of structure the SGE stores at skb->head of TX_DATA packets
+ * and for which we must reserve space.
+ */
+struct sge_opaque_hdr {
+	void *dev;
+	dma_addr_t addr[MAX_SKB_FRAGS + 1];
+};
+
+/* for TX: a skb must have a headroom of at least TX_HEADER_LEN bytes */
+#define TX_HEADER_LEN \
+	(sizeof(struct fw_ofld_tx_data_wr) + sizeof(struct sge_opaque_hdr))
+#define SKB_TX_HEADROOM	SKB_MAX_HEAD(TX_HEADER_LEN)
+
+#endif	/* __CXGB4I_OFFLOAD_H__ */
+
diff --git a/drivers/scsi/cxgb4i/cxgb4i_snic.c b/drivers/scsi/cxgb4i/cxgb4i_snic.c
new file mode 100644
index 0000000..36be666
--- /dev/null
+++ b/drivers/scsi/cxgb4i/cxgb4i_snic.c
@@ -0,0 +1,253 @@
+/*
+ * cxgb4i_snic.c: Chelsio T4 iSCSI driver.
+ *
+ * Copyright (c) 2010 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * Written by: Karen Xie (kxie-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ * Written by: Rakesh Ranjan (rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org)
+ */
+
+#include <net/route.h>
+
+#include "cxgb4i.h"
+
+#define	DRV_MODULE_NAME		"cxgb4i"
+#define	DRV_MODULE_VERSION	"0.90"
+#define	DRV_MODULE_RELDATE	"04/08/2010"
+
+static char version[] =
+	"Chelsio T4 iSCSI driver " DRV_MODULE_NAME
+	" v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
+
+MODULE_AUTHOR("Rakesh Ranjan <rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>");
+MODULE_DESCRIPTION("Chelsio T4 iSCSI driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(DRV_MODULE_VERSION);
+
+
+static LIST_HEAD(snic_list);
+static DEFINE_MUTEX(snic_rwlock);
+
+static cxgb4i_cplhandler_func cxgb4i_cplhandlers[NUM_CPL_CMDS];
+static void *cxgb4i_uld_add(const struct cxgb4_lld_info *linfo);
+static int cxgb4i_uld_rx_handler(void *handle, const __be64 *rsp,
+				const struct pkt_gl *pgl);
+static int cxgb4i_uld_state_change(void *handle, enum cxgb4_state state);
+
+static struct cxgb4_uld_info cxgb4i_uld_info = {
+	.name = "cxgb4i",
+	.add = cxgb4i_uld_add,
+	.rx_handler = cxgb4i_uld_rx_handler,
+	.state_change = cxgb4i_uld_state_change,
+};
+
+struct cxgb4i_hba *cxgb4i_hba_find_by_netdev(struct net_device *dev)
+{
+	int i;
+	struct cxgb4i_snic *snic = NULL;;
+
+	if (dev->priv_flags & IFF_802_1Q_VLAN)
+		dev = vlan_dev_real_dev(dev);
+
+	mutex_lock(&snic_rwlock);
+	list_for_each_entry(snic, &snic_list, list_head) {
+		for (i = 0; i < snic->hba_cnt; i++) {
+			if (snic->hba[i]->ndev == dev) {
+				mutex_unlock(&snic_rwlock);
+				return snic->hba[i];
+			}
+		}
+	}
+	mutex_unlock(&snic_rwlock);
+	return NULL;
+}
+
+struct cxgb4i_snic *cxgb4i_find_snic(struct net_device *dev, __be32 ipaddr)
+{
+	struct flowi fl;
+	struct rtable *rt;
+	struct net_device *sdev = NULL;
+	struct cxgb4i_snic *snic = NULL, *tmp;
+	int err, i;
+
+	memset(&fl, 0, sizeof(fl));
+	fl.nl_u.ip4_u.daddr = ipaddr;
+
+	err = ip_route_output_key(dev ? dev_net(dev) : &init_net, &rt, &fl);
+	if (err)
+		goto out;
+
+	sdev = (&rt->u.dst)->dev;
+	mutex_lock(&snic_rwlock);
+	list_for_each_entry_safe(snic, tmp, &snic_list, list_head) {
+		if (snic) {
+			for (i = 0; i < snic->lldi.nports; i++) {
+				if (sdev == snic->lldi.ports[i]) {
+					mutex_unlock(&snic_rwlock);
+					return snic;
+				}
+			}
+		}
+	}
+	mutex_unlock(&snic_rwlock);
+
+out:
+	snic = NULL;
+	return snic;
+}
+
+
+struct cxgb4i_snic *cxgb4i_snic_init(const struct cxgb4_lld_info *linfo)
+{
+	struct cxgb4i_snic *snic;
+	int i;
+
+	snic = kzalloc(sizeof(*snic), GFP_KERNEL);
+	if (snic) {
+
+		spin_lock_init(&snic->lock);
+		snic->lldi = *linfo;
+		snic->hba_cnt = snic->lldi.nports;
+		snic->funcs = cxgb4i_cplhandlers;
+
+		cxgb4i_iscsi_init();
+		cxgb4i_pdu_init();
+		cxgb4i_ddp_init(snic);
+		cxgb4i_ofld_init(snic);
+
+		for (i = 0; i < snic->hba_cnt; i++) {
+			snic->hba[i] = cxgb4i_hba_add(snic,
+						snic->lldi.ports[i]);
+			if (!snic->hba[i]) {
+				kfree(snic);
+				snic = ERR_PTR(-ENOMEM);
+				goto out;
+			}
+		}
+
+		mutex_lock(&snic_rwlock);
+		list_add_tail(&snic->list_head, &snic_list);
+		mutex_unlock(&snic_rwlock);
+	} else
+out :
+	snic = ERR_PTR(-ENOMEM);
+
+	return snic;
+}
+
+void cxgb4i_snic_cleanup(void)
+{
+	struct cxgb4i_snic *snic, *tmp;
+	int i;
+
+	mutex_lock(&snic_rwlock);
+	list_for_each_entry_safe(snic, tmp, &snic_list, list_head) {
+		list_del(&snic->list_head);
+
+		for (i = 0; i < snic->hba_cnt; i++) {
+			if (snic->hba[i]) {
+				cxgb4i_hba_remove(snic->hba[i]);
+				snic->hba[i] = NULL;
+			}
+		}
+		cxgb4i_ofld_cleanup(snic);
+		cxgb4i_ddp_cleanup(snic);
+		cxgb4i_log_info("snic 0x%p, %u scsi hosts removed.\n",
+				snic, snic->hba_cnt);
+
+		kfree(snic);
+	}
+	mutex_unlock(&snic_rwlock);
+
+	cxgb4i_pdu_cleanup();
+	cxgb4i_iscsi_cleanup();
+}
+
+static void *cxgb4i_uld_add(const struct cxgb4_lld_info *linfo)
+{
+	struct cxgb4i_snic *snic;
+
+	cxgb4i_log_info("%s", version);
+
+	snic = cxgb4i_snic_init(linfo);
+	if (!snic)
+		goto out;
+out:
+	return snic;
+}
+
+static int cxgb4i_uld_rx_handler(void *handle, const __be64 *rsp,
+				const struct pkt_gl *pgl)
+{
+	struct cxgb4i_snic *snic = handle;
+	struct sk_buff *skb;
+	const struct cpl_act_establish *rpl;
+	unsigned int opcode;
+
+	if (pgl == NULL) {
+		unsigned int len = 64 - sizeof(struct rsp_ctrl) - 8;
+
+		skb = alloc_skb(256, GFP_ATOMIC);
+		if (!skb)
+			goto nomem;
+		__skb_put(skb, len);
+		skb_copy_to_linear_data(skb, &rsp[1], len);
+
+	} else if (pgl == CXGB4_MSG_AN) {
+
+		return 0;
+
+	} else {
+
+		skb = cxgb4_pktgl_to_skb(pgl, 256, 256);
+		if (unlikely(!skb))
+			goto nomem;
+	}
+
+	rpl = cplhdr(skb);
+	opcode = rpl->ot.opcode;
+
+	cxgb4i_api_debug("snic %p, opcode 0x%x, skb %p\n",
+			 snic, opcode, skb);
+
+	BUG_ON(!snic->funcs[opcode]);
+
+	if (snic->funcs[opcode]) {
+		snic->funcs[opcode](snic, skb);
+	} else
+		cxgb4i_log_error("No handler for opcode 0x%x\n",
+				opcode);
+
+	return 0;
+
+nomem:
+	cxgb4i_api_debug("OOM bailing out\n");
+	return 1;
+}
+
+static int cxgb4i_uld_state_change(void *handle, enum cxgb4_state state)
+{
+	return 0;
+}
+
+static int __init cxgb4i_init_module(void)
+{
+	cxgb4_register_uld(CXGB4_ULD_ISCSI, &cxgb4i_uld_info);
+
+	return 0;
+}
+
+static void __exit cxgb4i_exit_module(void)
+{
+
+	cxgb4_unregister_uld(CXGB4_ULD_ISCSI);
+	cxgb4i_snic_cleanup();
+}
+
+module_init(cxgb4i_init_module);
+module_exit(cxgb4i_exit_module);
+
-- 
1.6.6.1

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply related

* [PATCH 1/3] cxgb4i: add build support
From: Rakesh Ranjan @ 2010-05-04 16:43 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, kxie-ut6Up61K2wZBDgjK7y7TUQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk,
	michaelc-hcNo3dDEHLuVc3sceRu5cw, Rakesh Ranjan
In-Reply-To: <1272991388-8527-1-git-send-email-rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

From: Rakesh Ranjan <rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>


Signed-off-by: Rakesh Ranjan <rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
 drivers/scsi/Kconfig        |    1 +
 drivers/scsi/Makefile       |    1 +
 drivers/scsi/cxgb4i/Kbuild  |    4 ++++
 drivers/scsi/cxgb4i/Kconfig |    7 +++++++
 4 files changed, 13 insertions(+), 0 deletions(-)
 create mode 100644 drivers/scsi/cxgb4i/Kbuild
 create mode 100644 drivers/scsi/cxgb4i/Kconfig

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 75f2336..fc3810a 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -371,6 +371,7 @@ config ISCSI_TCP
 	 http://open-iscsi.org
 
 source "drivers/scsi/cxgb3i/Kconfig"
+source "drivers/scsi/cxgb4i/Kconfig"
 source "drivers/scsi/bnx2i/Kconfig"
 source "drivers/scsi/be2iscsi/Kconfig"
 
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 1c7ac49..46dcdc8 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -133,6 +133,7 @@ obj-$(CONFIG_SCSI_STEX)		+= stex.o
 obj-$(CONFIG_SCSI_MVSAS)	+= mvsas/
 obj-$(CONFIG_PS3_ROM)		+= ps3rom.o
 obj-$(CONFIG_SCSI_CXGB3_ISCSI)	+= libiscsi.o libiscsi_tcp.o cxgb3i/
+obj-$(CONFIG_SCSI_CXGB4_ISCSI)	+= libiscsi.o libiscsi_tcp.o cxgb4i/
 obj-$(CONFIG_SCSI_BNX2_ISCSI)	+= libiscsi.o bnx2i/
 obj-$(CONFIG_BE2ISCSI)		+= libiscsi.o be2iscsi/
 obj-$(CONFIG_SCSI_PMCRAID)	+= pmcraid.o
diff --git a/drivers/scsi/cxgb4i/Kbuild b/drivers/scsi/cxgb4i/Kbuild
new file mode 100644
index 0000000..5892c6a
--- /dev/null
+++ b/drivers/scsi/cxgb4i/Kbuild
@@ -0,0 +1,4 @@
+EXTRA_CFLAGS += -I$(srctree)/drivers/net/cxgb4
+
+cxgb4i-y := libcxgbi.o cxgb4i_snic.o cxgb4i_iscsi.o cxgb4i_offload.o cxgb4i_ddp.o 
+obj-$(CONFIG_SCSI_CXGB4_ISCSI) += cxgb4i.o
diff --git a/drivers/scsi/cxgb4i/Kconfig b/drivers/scsi/cxgb4i/Kconfig
new file mode 100644
index 0000000..3f33dc2
--- /dev/null
+++ b/drivers/scsi/cxgb4i/Kconfig
@@ -0,0 +1,7 @@
+config SCSI_CXGB4_ISCSI
+	tristate "Chelsio T4 iSCSI support"
+	depends on CHELSIO_T4_DEPENDS
+	select CHELSIO_T4
+	select SCSI_ISCSI_ATTRS
+	---help---
+	This driver supports iSCSI offload for the Chelsio T4 series devices.
-- 
1.6.6.1

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply related

* cxgb4i submission - V2
From: Rakesh Ranjan @ 2010-05-04 16:43 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, kxie-ut6Up61K2wZBDgjK7y7TUQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk,
	michaelc-hcNo3dDEHLuVc3sceRu5cw

The following 3 patches add a new iscsi LLD driver cxgb4i to enable iscsi offload
support on Chelsio's new 1G and 10G cards. This is updated version of previous cxgb4i
patch. Please share you commnets after review.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Stevens @ 2010-05-04 16:12 UTC (permalink / raw)
  To: Brian Haley; +Cc: David Miller, enh, netdev, netdev-owner
In-Reply-To: <4BE031FA.6040006@hp.com>

I think the original code was intending to do late binding -- carry "-1" 
as
meaning "not set by user" and use the default value _at_the_time_of_
_the_send_, and in its context. For that to have worked, the checks for
"<0" in the send paths should've checked for multicast and used the
multicast default as you're saying, Brian. And doing that not on the
set, but when generating packets, is what I would've expected.

I don't see anything that's broken by changing it to use the default at
the time of the set since for mcast the default is really a constant,
and in fact, it looks like in addition to not actually using the default 
of 1,
it was returning "-1" in the cmsg when not set by the user (and it, too,
should've been "1", which it would return now).

But if the default is different for each destination or interface in
the multicast case (ie, by adding conf settings for mcast), then
it really should do late binding and leave it as "-1" in the set, right?
That's what I thought it was already doing, but apparently not;
I think it used to, but maybe I just didn't notice.

                                        +-DLS

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Stephen Hemminger @ 2010-05-04 16:12 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Ben Hutchings, netdev
In-Reply-To: <l2s571fb4001005040728t91979906ofa10cf0714c305b2@mail.gmail.com>

On Tue, 4 May 2010 19:58:32 +0530
Bhaskar Dutta <bhaskie@gmail.com> wrote:

> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
> >> Hi,
> >>
> >> I am observing intermittent TCP-MD5 checksum failures
> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
> >>
> >> The problem is only seen in multi-core 64 bit machines.
> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
> >> that the percpu allocator has been re-implemented in 2.6.33) that
> >> might cause a corruption in 64 bit SMP machines?
> >>
> >> Any pointers would be appreciated.
> >
> > There was another recent report of incorrect MD5 signatures in
> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
> > response.
> >
> > Ben.
> >
> 
> I found another thread posted back in Jan 2007 with a similar bug
> (x86_64 on 2.6.20) but no replies to that as well.
> http://lkml.org/lkml/2007/1/20/56

2.6.20 had lots of other MD5 bugs. Your problem might be related to
GRO.  MD5 may not handle multi-fragment packets.

^ permalink raw reply

* Re: [PATCH] net/gianfar: drop recycled skbs on MTU change
From: Andy Fleming @ 2010-05-04 15:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Andy Fleming, netdev
In-Reply-To: <20100503151745.GA17997@Chamillionaire.breakpoint.cc>

On Mon, May 3, 2010 at 8:17 AM, Sebastian Andrzej Siewior
<sebastian@breakpoint.cc> wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> The size for skb which is added to the recycled list is using the
> current descriptor size which is current MTU. gfar_new_skb() is also
> using this size. So after changing or alteast increasing the MTU all
> recycled skbs should be dropped.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> I'm not 100% sure but it looks like it is wrong.
>
>  drivers/net/gianfar.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 5267c27..9093106 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2287,8 +2287,10 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
>
>        /* Only stop and start the controller if it isn't already
>         * stopped, and we changed something */
> -       if ((oldsize != tempsize) && (dev->flags & IFF_UP))
> +       if ((oldsize != tempsize) && (dev->flags & IFF_UP)) {
>                stop_gfar(dev);
> +               skb_queue_purge(&priv->rx_recycle);
> +       }


I think we should probably do this in free_skb_resources.  And remove
the call from gfar_close().

Andy

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox