Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH V2] netfilter/iptables: Fix log-level processing
From: Pablo Neira Ayuso @ 2012-09-12 15:25 UTC (permalink / raw)
  To: joe
  Cc: netfilter-devel, netdev, Bart De Schuymer, Patrick McHardy,
	Stephen Hemminger
In-Reply-To: <1347437245.13103.697.camel@edumazet-glaptop>

I have applied this patch. Thanks Joe.

^ permalink raw reply

* Re: [PATCH net-next V3 1/2] IB/ipoib: Add rtnl_link_ops support
From: Eric Dumazet @ 2012-09-12 15:13 UTC (permalink / raw)
  To: Rami Rosen; +Cc: Or Gerlitz, Patrick McHardy, netdev, Shlomo Pongratz
In-Reply-To: <CAKoUAr=8zZA9EgvJEVrd0a-Uw=zzksHj+5P2Sp3Lb4vXgSJRYA@mail.gmail.com>

On Wed, 2012-09-12 at 17:53 +0300, Rami Rosen wrote:
> Hi,
> 
> From the dump of CPU #1, it seems indeed not related at all to "modprobe -r".
> 
> Could it be that there is some IB stack sysfs write activity?
> (regardless of the modprobe -r" you issued) ?  I see some candidates
> for it.
> 
> delete_child() is a method of the IB stack (ipoib/ipoib_main.c)
> 
> Maybe in order to help debug the problem, you might try to add in
> delete_child() method, print of the name of the attribute which is
> being deleted ?
> 
>   (struct device_attribute has a a member "struct attribute attr",
> which in turn has  "const char *name").


It might be related to module load/unload

udevd or some external daemon can access sysfs files while you unload
the module

^ permalink raw reply

* [PATCH 9/9] drivers/isdn/gigaset/common.c: Remove useless kfree
From: Peter Senna Tschudin @ 2012-09-12 15:06 UTC (permalink / raw)
  To: Hansjoerg Lipp
  Cc: kernel-janitors, Tilman Schmidt, Karsten Keil, gigaset307x-common,
	netdev, linux-kernel

From: Peter Senna Tschudin <peter.senna@gmail.com>

Remove useless kfree() and clean up code related to the removal.

The semantic patch that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
position p1,p2;
expression x;
@@

if (x@p1 == NULL) { ... kfree@p2(x); ... return ...; }

@unchanged exists@
position r.p1,r.p2;
expression e <= r.x,x,e1;
iterator I;
statement S;
@@

if (x@p1 == NULL) { ... when != I(x,...) S
                        when != e = e1
                        when != e += e1
                        when != e -= e1
                        when != ++e
                        when != --e
                        when != e++
                        when != e--
                        when != &e
   kfree@p2(x); ... return ...; }

@ok depends on unchanged exists@
position any r.p1;
position r.p2;
expression x;
@@

... when != true x@p1 == NULL
kfree@p2(x);

@depends on !ok && unchanged@
position r.p2;
expression x;
@@

*kfree@p2(x);
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>

---
 drivers/isdn/gigaset/common.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/isdn/gigaset/common.c b/drivers/isdn/gigaset/common.c
index aa41485..30a6b17 100644
--- a/drivers/isdn/gigaset/common.c
+++ b/drivers/isdn/gigaset/common.c
@@ -1123,7 +1123,6 @@ struct gigaset_driver *gigaset_initdriver(unsigned minor, unsigned minors,
 	return drv;
 
 error:
-	kfree(drv->cs);
 	kfree(drv);
 	return NULL;
 }

^ permalink raw reply related

* Re: [PATCH net-next V3 1/2] IB/ipoib: Add rtnl_link_ops support
From: Rami Rosen @ 2012-09-12 14:53 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Patrick McHardy, Eric Dumazet, netdev, Shlomo Pongratz
In-Reply-To: <5050668B.1010105@mellanox.com>

Hi,

>From the dump of CPU #1, it seems indeed not related at all to "modprobe -r".

Could it be that there is some IB stack sysfs write activity?
(regardless of the modprobe -r" you issued) ?  I see some candidates
for it.

delete_child() is a method of the IB stack (ipoib/ipoib_main.c)

Maybe in order to help debug the problem, you might try to add in
delete_child() method, print of the name of the attribute which is
being deleted ?

  (struct device_attribute has a a member "struct attribute attr",
which in turn has  "const char *name").

Regards,
Rami Rosen

^ permalink raw reply

* Re: GRO aggregation
From: Shlomo Pongartz @ 2012-09-12 14:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <1347442394.13103.703.camel@edumazet-glaptop>

On 9/12/2012 12:33 PM, Eric Dumazet wrote:
> On Wed, 2012-09-12 at 12:23 +0300, Shlomo Pongartz wrote:
>> On 9/11/2012 10:35 PM, Eric Dumazet wrote:
>>> On Tue, 2012-09-11 at 19:24 +0000, Shlomo Pongratz wrote:
>>>
>>>> I see that in ixgbe the weight for the NAPI is 64 (netif_napi_add). So
>>>> if packets are arriving in high rate then an the CPU is fast enough to
>>>> collect the packets as they arrive, assuming packets continue to
>>>> arrives while the NAPI runs. Then it should have aggregate more. So we
>>>> will have less passes trough the stack.
>>>>
>>> As I said, _if_ your cpu was loaded by other stuff, then you would see
>>> biggest GRO packets.
>>>
>>> GRO is not : "We want to kill latency and have big packets just because
>>> its better"
>>>
>>> Its more like : If load is big enough, try to aggregate TCP frames in
>>> less skbs.
>>>
>>>
>>>
>>>
>> First I want to apologize for breaking the mailing thread. I wasn't at
>> work and used webmail.
>>
>> I agree with your but I think that something is still strange.
>> On the transmitter side all the offloading are enabled, e.g. TSO and GSO.
>> The tcpdump on the sender side shows size of 64240 which is 44 packets
>> of 1460 each.
>> Now since the offloading are enabled the HW should transmit 44 frames
>> back to back,
>> that is in a burst of 44 * 1500 bytes, which according to my calculation
>> should take 52.8 micro on 10G Ethernet.
>> Using ethtool I've set the rx-usecs to 1022 micro, which I think is the
>> maximal value for ixgbe.
>> Note that there is no way to set rx-frames on ixgbe.
>> Now since the ixgbe weight is 64 I expected that the NAPI will be able
>> to poll for more then 21 packets,
>> since 44 packets came in one burst.
>> However the results remains the same.
> TSO uses PAGE frags, so 64KB needs about 16 pages.
>
> tcp_sendmsg() could even use order-3 pages, so that only 2 pages would
> be needed to fill 64KB of data.
>
> GRO uses whatever fragment size provided by NIC, depending on MTU.
>
> One skb has a limit on number of frags.
>
> Handling a huge array of frags would be actually slower in some helper
> functions.
>
> Since you dont exactly describe why you ask all these questions, its
> hard to guess what problem you try to solve.
>
>
>
> .
>
Hi Eric

The TSO is just a mean to create a burst of frames on the wire so the 
NAPI will be able to pool as much as possible.
I'm looking on the aggregation done by GRO on behalf of IPoIB. With 
IPoIB I added a counter that counts how many
packets were aggregated before napi_complete is called (ether directly 
or by net_rx_action) and found that although
the NAPI consumes 64 packets on average before napi_complete is called, 
the tcpdump shows  that no more then 16-17
packets were aggregated. BTW when I increased the MTU  to 4K I did 
reached 64K aggregation which again is 16-17 packets.
So in order to see if 17 packets is the aggregation limit I  wanted to 
see how ixgbe is doing and found that it aggregates 21 packets.
So I wanted to know if there is another factor that governs the 
aggregation, one that I can tune.

Shlomo.

^ permalink raw reply

* Re: [PATCHv4] virtio-spec: virtio network device multiqueue support
From: Tom Herbert @ 2012-09-12 14:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, Jason Wang, kvm, virtualization, netdev, pbonzini,
	levinsasha928, rick.jones2
In-Reply-To: <20120912075737.GA30455@redhat.com>

On Wed, Sep 12, 2012 at 12:57 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Sep 12, 2012 at 03:19:11PM +0930, Rusty Russell wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>> > On 09/10/2012 02:33 PM, Michael S. Tsirkin wrote:
>> >> A final addition: what you suggest above would be
>> >> "TX follows RX", right?
>>
>> BTW, yes.  But it's a weird way to express what the nic is doing.
>
> It explains what the system is doing.
> TX is done by driver, RX by nic.
> We document both driver and device in the spec
> so I thought it's fine. any suggestions wellcome.
>
>> >> It is in anticipation of something like that, that I made
>> >> steering programming so generic.
>>
>> >> I think TX follows RX is more immediately useful for reasons above
>> >> but we can add both to spec and let drivers and devices
>> >> decide what they want to support.
>>
>> You mean "RX follows TX"?  ie. accelerated RFS.  I agree.
>
RX following TX is logic of flow director I believe.  {a}RFS has RX
follow CPU where application receive is done on the socket.  So in RFS
there is no requirement to have a 1-1 correspondence between TX and RX
queues, and in fact this allows different number of queues between TX
and RX.  We found this necessary when using priority HW queues, so
that there are more TX queues than RX.

>
> Yes that's what I meant. Thanks for the correction.
>
>> Perhaps Tom can explain how we avoid out-of-order receive for the
>> accelerated RFS case?  It's not clear to me, but we need to be able to
>> do that for virtio-net if it implements accelerated RFS.
>
AFAIK ooo RX is still possible with accelerated RFS.  We have an
algorithm that prevents this for RFS by deferring a migration to a new
queue as long as it's possible that a flow might have outstanding
packets on the old queue.  I suppose this could be implemented in the
device for the HW queues, but I don't think it would be easy to cover
all cases where packets were already in transit to the host or other
cases where host and device queues are out of sync.

> Basically this has tx vq per cpu and relies on scheduler not bouncing threads
> between cpus too aggressively. Appears to be what ixgbe does.
>
>> > AFAIK, ixgbe does "rx follows tx". The only differences between ixgbe
>> > and virtio-net is that ixgbe driver programs the flow director during
>> > packet transmission but we suggest to do it silently in the device for
>> > simplicity.
>>
>> Implying the receive queue by xmit will be slightly laggy.  Don't know
>> if that's a problem.
>>
>> Cheers,
>> Rusty.
>
> Doesn't seem to be a problem in Jason's testing so far.

^ permalink raw reply

* Re: [PATCHv4] virtio-spec: virtio network device multiqueue support
From: Tom Herbert @ 2012-09-12 14:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kvm, Michael S. Tsirkin, netdev, rick.jones2, virtualization,
	levinsasha928, pbonzini
In-Reply-To: <87har3dc4o.fsf@rustcorp.com.au>


[-- Attachment #1.1: Type: text/plain, Size: 2086 bytes --]

On Tue, Sep 11, 2012 at 10:49 PM, Rusty Russell <rusty@rustcorp.com.au>wrote:

> Jason Wang <jasowang@redhat.com> writes:
> > On 09/10/2012 02:33 PM, Michael S. Tsirkin wrote:
> >> A final addition: what you suggest above would be
> >> "TX follows RX", right?
>
> BTW, yes.  But it's a weird way to express what the nic is doing.
>
> >> It is in anticipation of something like that, that I made
> >> steering programming so generic.
>
> >> I think TX follows RX is more immediately useful for reasons above
> >> but we can add both to spec and let drivers and devices
> >> decide what they want to support.
>
> You mean "RX follows TX"?  ie. accelerated RFS.  I agree.
>
> RX following TX is logic of flow director I believe.  {a}RFS has RX follow
CPU where application receive is done on the socket.  So in RFS there is no
requirement to have a 1-1 correspondence between TX and RX queues, and in
fact this allows different number of queues between TX and RX.  We found
this necessary when using priority HW queues, so that there are more TX
queues than RX.

Perhaps Tom can explain how we avoid out-of-order receive for the
> accelerated RFS case?  It's not clear to me, but we need to be able to
> do that for virtio-net if it implements accelerated RFS.
>
> AFAIK ooo RX is possible with accelerated RFS.  We have an algorithm that
prevents this for RFS case by deferring a migration to a new queue as long
as it's possible that a flow might have outstanding packets on the old
queue.  I suppose this could be implemented in the device for the HW
queues, but I don't think it would be easy to cover all cases where packets
were already in transit to the host or other cases where host and device
queues are out of sync.


> > AFAIK, ixgbe does "rx follows tx". The only differences between ixgbe
> > and virtio-net is that ixgbe driver programs the flow director during
> > packet transmission but we suggest to do it silently in the device for
> > simplicity.
>
> Implying the receive queue by xmit will be slightly laggy.  Don't know
> if that's a problem.
>
> Cheers,
> Rusty.
>

[-- Attachment #1.2: Type: text/html, Size: 2944 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [PATCH v4 6/8] cgroup: Do not depend on a given order when populating the subsys array
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev, cgroups
  Cc: Daniel Wagner, Gao feng, Jamal Hadi Salim, John Fastabend,
	Li Zefan, Neil Horman, Tejun Heo
In-Reply-To: <1347459128-32236-1-git-send-email-wagi@monom.org>

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

The *_subsys_id will be used as index to access the subsys. Therefore
we need to care we populate the subsystem at the correct position by
using designated initialization.

With this change we are able to interleave builtin and modules in the subsys
array.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 769600c..343ab4e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -92,7 +92,7 @@ static DEFINE_MUTEX(cgroup_root_mutex);
  * registered after that. The mutable section of this array is protected by
  * cgroup_mutex.
  */
-#define SUBSYS(_x) &_x ## _subsys,
+#define SUBSYS(_x) [_x ## _subsys_id] = &_x ## _subsys,
 #define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
 static struct cgroup_subsys *subsys[CGROUP_SUBSYS_COUNT] = {
 #include <linux/cgroup_subsys.h>
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* [PATCH v4 7/8] cgroup: Assign subsystem IDs during compile time
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev, cgroups
  Cc: Daniel Wagner, David S. Miller, Paul E. McKenney, Andrew Morton,
	Eric Dumazet, Gao feng, Glauber Costa, Herbert Xu,
	Jamal Hadi Salim, John Fastabend, Kamezawa Hiroyuki, Li Zefan,
	Neil Horman, Tejun Heo
In-Reply-To: <1347459128-32236-1-git-send-email-wagi@monom.org>

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

WARNING: With this change it is impossible to load external built
controllers anymore.

In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
set, corresponding subsys_id should also be a constant. Up to now,
net_prio_subsys_id and net_cls_subsys_id would be of the type int and
the value would be assigned during runtime.

By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
to IS_ENABLED, all *_subsys_id will have constant value. That means we
need to remove all the code which assumes a value can be assigned to
net_prio_subsys_id and net_cls_subsys_id.

A close look is necessary on the RCU part which was introduces by
following patch:

commit f845172531fb7410c7fb7780b1a6e51ee6df7d52
Author:	Herbert Xu <herbert@gondor.apana.org.au>  Mon May 24 09:12:34 2010
Committer:	David S. Miller <davem@davemloft.net>  Mon May 24 09:12:34 2010

cls_cgroup: Store classid in struct sock

Tis code was added to init_cgroup_cls()

	/* We can't use rcu_assign_pointer because this is an int. */
	smp_wmb();
	net_cls_subsys_id = net_cls_subsys.subsys_id;

respectively to exit_cgroup_cls()

	net_cls_subsys_id = -1;
	synchronize_rcu();

and in module version of task_cls_classid()

	rcu_read_lock();
	id = rcu_dereference(net_cls_subsys_id);
	if (id >= 0)
		classid = container_of(task_subsys_state(p, id),
				       struct cgroup_cls_state, css)->classid;
	rcu_read_unlock();

Without an explicit explaination why the RCU part is needed. (The
rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
in a later commit, but that is a minor detail.)

So here is my pondering why it was introduced and why it safe to
remove it now. Note that this code was copied over to net_prio the
reasoning holds for that subsystem too.

The idea behind the RCU use for net_cls_subsys_id is to make sure we
get a valid pointer back from task_subsys_state(). task_subsys_state()
is just blindly accessing the subsys array and returning the
pointer. Obviously, passing in -1 as id into task_subsys_state()
returns an invalid value (out of lower bound).

So this code makes sure that only after module is loaded and the
subsystem registered, the id is assigned.

Before unregistering the module all old readers must have left the
critical section. This is done by assigning -1 to the id and issuing a
synchronized_rcu(). Any new readers wont call task_subsys_state()
anymore and therefore it is safe to unregister the subsystem.

The new code relies on the same trick, but it looks at the subsys
pointer return by task_subsys_state() (remember the id is constant
and therefore we allways have a valid index into the subsys
array).

No precautions need to be taken during module loading
module. Eventually, all CPUs will get a valid pointer back from
task_subsys_state() because rebind_subsystem() which is called after
the module init() function will assigned subsys[net_cls_subsys_id] the
newly loaded module subsystem pointer.

When the subsystem is about to be removed, rebind_subsystem() will
called before the module exit() function. In this case,
rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
and then it calls synchronize_rcu(). All old readers have left by then
the critical section. Any new reader wont access the subsystem
anymore.  At this point we are safe to unregister the subsystem. No
synchronize_rcu() call is needed.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
---
 include/linux/cgroup.h       |  2 +-
 include/net/cls_cgroup.h     | 12 ++++--------
 include/net/netprio_cgroup.h | 18 +++++-------------
 kernel/cgroup.c              | 22 +++-------------------
 net/core/netprio_cgroup.c    | 11 -----------
 net/core/sock.c              | 11 -----------
 net/sched/cls_cgroup.c       | 13 -------------
 7 files changed, 13 insertions(+), 76 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a5ab565..018f819 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -46,7 +46,7 @@ extern const struct file_operations proc_cgroup_operations;
 
 /* Define the enumeration of all builtin cgroup subsystems */
 #define SUBSYS(_x) _x ## _subsys_id,
-#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
+#define IS_SUBSYS_ENABLED(option) IS_ENABLED(option)
 enum cgroup_subsys_id {
 #include <linux/cgroup_subsys.h>
 	__CGROUP_TEMPORARY_PLACEHOLDER
diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index 9bd5db9..b6a6eeb 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -42,22 +42,18 @@ static inline u32 task_cls_classid(struct task_struct *p)
 	return classid;
 }
 #elif IS_MODULE(CONFIG_NET_CLS_CGROUP)
-
-extern int net_cls_subsys_id;
-
 static inline u32 task_cls_classid(struct task_struct *p)
 {
-	int id;
+	struct cgroup_subsys_state *css;
 	u32 classid = 0;
 
 	if (in_interrupt())
 		return 0;
 
 	rcu_read_lock();
-	id = rcu_dereference_index_check(net_cls_subsys_id,
-					 rcu_read_lock_held());
-	if (id >= 0)
-		classid = container_of(task_subsys_state(p, id),
+	css = task_subsys_state(p, net_cls_subsys_id);
+	if (css)
+		classid = container_of(css,
 				       struct cgroup_cls_state, css)->classid;
 	rcu_read_unlock();
 
diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
index b202de8..2760f4f 100644
--- a/include/net/netprio_cgroup.h
+++ b/include/net/netprio_cgroup.h
@@ -30,10 +30,6 @@ struct cgroup_netprio_state {
 	u32 prioidx;
 };
 
-#ifndef CONFIG_NETPRIO_CGROUP
-extern int net_prio_subsys_id;
-#endif
-
 extern void sock_update_netprioidx(struct sock *sk, struct task_struct *task);
 
 #if IS_BUILTIN(CONFIG_NETPRIO_CGROUP)
@@ -55,18 +51,14 @@ static inline u32 task_netprioidx(struct task_struct *p)
 
 static inline u32 task_netprioidx(struct task_struct *p)
 {
-	struct cgroup_netprio_state *state;
-	int subsys_id;
+	struct cgroup_subsys_state *css;
 	u32 idx = 0;
 
 	rcu_read_lock();
-	subsys_id = rcu_dereference_index_check(net_prio_subsys_id,
-						rcu_read_lock_held());
-	if (subsys_id >= 0) {
-		state = container_of(task_subsys_state(p, subsys_id),
-				     struct cgroup_netprio_state, css);
-		idx = state->prioidx;
-	}
+	css = task_subsys_state(p, net_prio_subsys_id);
+	if (css)
+		idx = container_of(css,
+				   struct cgroup_netprio_state, css)->prioidx;
 	rcu_read_unlock();
 	return idx;
 }
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 343ab4e..4a364f1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4458,24 +4458,8 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
 	/* init base cftset */
 	cgroup_init_cftsets(ss);
 
-	/*
-	 * need to register a subsys id before anything else - for example,
-	 * init_cgroup_css needs it.
-	 */
 	mutex_lock(&cgroup_mutex);
-	/* find the first empty slot in the array */
-	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-		if (subsys[i] == NULL)
-			break;
-	}
-	if (i == CGROUP_SUBSYS_COUNT) {
-		/* maximum number of subsystems already registered! */
-		mutex_unlock(&cgroup_mutex);
-		return -EBUSY;
-	}
-	/* assign ourselves the subsys_id */
-	ss->subsys_id = i;
-	subsys[i] = ss;
+	subsys[ss->subsys_id] = ss;
 
 	/*
 	 * no ss->create seems to need anything important in the ss struct, so
@@ -4484,7 +4468,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
 	css = ss->create(dummytop);
 	if (IS_ERR(css)) {
 		/* failure case - need to deassign the subsys[] slot. */
-		subsys[i] = NULL;
+		subsys[ss->subsys_id] = NULL;
 		mutex_unlock(&cgroup_mutex);
 		return PTR_ERR(css);
 	}
@@ -4500,7 +4484,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
 		if (ret) {
 			dummytop->subsys[ss->subsys_id] = NULL;
 			ss->destroy(dummytop);
-			subsys[i] = NULL;
+			subsys[ss->subsys_id] = NULL;
 			mutex_unlock(&cgroup_mutex);
 			return ret;
 		}
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index c75e3f9..6bc460c 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -326,9 +326,7 @@ struct cgroup_subsys net_prio_subsys = {
 	.create		= cgrp_create,
 	.destroy	= cgrp_destroy,
 	.attach		= net_prio_attach,
-#ifdef CONFIG_NETPRIO_CGROUP
 	.subsys_id	= net_prio_subsys_id,
-#endif
 	.base_cftypes	= ss_files,
 	.module		= THIS_MODULE
 };
@@ -366,10 +364,6 @@ static int __init init_cgroup_netprio(void)
 	ret = cgroup_load_subsys(&net_prio_subsys);
 	if (ret)
 		goto out;
-#ifndef CONFIG_NETPRIO_CGROUP
-	smp_wmb();
-	net_prio_subsys_id = net_prio_subsys.subsys_id;
-#endif
 
 	register_netdevice_notifier(&netprio_device_notifier);
 
@@ -386,11 +380,6 @@ static void __exit exit_cgroup_netprio(void)
 
 	cgroup_unload_subsys(&net_prio_subsys);
 
-#ifndef CONFIG_NETPRIO_CGROUP
-	net_prio_subsys_id = -1;
-	synchronize_rcu();
-#endif
-
 	rtnl_lock();
 	for_each_netdev(&init_net, dev) {
 		old = rtnl_dereference(dev->priomap);
diff --git a/net/core/sock.c b/net/core/sock.c
index ca3eaee..47b4ac0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -326,17 +326,6 @@ int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(__sk_backlog_rcv);
 
-#if defined(CONFIG_CGROUPS)
-#if !defined(CONFIG_NET_CLS_CGROUP)
-int net_cls_subsys_id = -1;
-EXPORT_SYMBOL_GPL(net_cls_subsys_id);
-#endif
-#if !defined(CONFIG_NETPRIO_CGROUP)
-int net_prio_subsys_id = -1;
-EXPORT_SYMBOL_GPL(net_prio_subsys_id);
-#endif
-#endif
-
 static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
 {
 	struct timeval tv;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 7743ea8..67cf90d 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -77,9 +77,7 @@ struct cgroup_subsys net_cls_subsys = {
 	.name		= "net_cls",
 	.create		= cgrp_create,
 	.destroy	= cgrp_destroy,
-#ifdef CONFIG_NET_CLS_CGROUP
 	.subsys_id	= net_cls_subsys_id,
-#endif
 	.base_cftypes	= ss_files,
 	.module		= THIS_MODULE,
 };
@@ -283,12 +281,6 @@ static int __init init_cgroup_cls(void)
 	if (ret)
 		goto out;
 
-#ifndef CONFIG_NET_CLS_CGROUP
-	/* We can't use rcu_assign_pointer because this is an int. */
-	smp_wmb();
-	net_cls_subsys_id = net_cls_subsys.subsys_id;
-#endif
-
 	ret = register_tcf_proto_ops(&cls_cgroup_ops);
 	if (ret)
 		cgroup_unload_subsys(&net_cls_subsys);
@@ -301,11 +293,6 @@ static void __exit exit_cgroup_cls(void)
 {
 	unregister_tcf_proto_ops(&cls_cgroup_ops);
 
-#ifndef CONFIG_NET_CLS_CGROUP
-	net_cls_subsys_id = -1;
-	synchronize_rcu();
-#endif
-
 	cgroup_unload_subsys(&net_cls_subsys);
 }
 
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* [PATCH v4 8/8] cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev, cgroups
  Cc: Daniel Wagner, Gao feng, Jamal Hadi Salim, John Fastabend,
	Li Zefan, Neil Horman, Tejun Heo
In-Reply-To: <1347459128-32236-1-git-send-email-wagi@monom.org>

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

Since we know exactly how many subsystems exists at compile time we are
able to define CGROUP_SUBSYS_COUNT correctly. CGROUP_SUBSYS_COUNT will
be at max 12 (all controllers enabled). Depending on the architecture
we safe either 32 - 12 pointers (80 bytes) or 64 - 12 pointers (416
bytes) per cgroup.

With this change we can also remove the temporary placeholder to avoid
compilation errors.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
---
 include/linux/cgroup.h | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 018f819..df354ae 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -49,16 +49,10 @@ extern const struct file_operations proc_cgroup_operations;
 #define IS_SUBSYS_ENABLED(option) IS_ENABLED(option)
 enum cgroup_subsys_id {
 #include <linux/cgroup_subsys.h>
-	__CGROUP_TEMPORARY_PLACEHOLDER
+	CGROUP_SUBSYS_COUNT,
 };
 #undef IS_SUBSYS_ENABLED
 #undef SUBSYS
-/*
- * This define indicates the maximum number of subsystems that can be loaded
- * at once. We limit to this many since cgroupfs_root has subsys_bits to keep
- * track of all of them.
- */
-#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
 
 /* Per-subsystem/per-cgroup state maintained by the system. */
 struct cgroup_subsys_state {
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* [PATCH v4 3/8] cgroup: net_prio: Do not define task_netpioidx() when not selected
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev, cgroups
  Cc: Daniel Wagner, Gao feng, Jamal Hadi Salim, John Fastabend,
	Li Zefan, Neil Horman
In-Reply-To: <1347459128-32236-1-git-send-email-wagi@monom.org>

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

task_netprioidx() should not be defined in case the configuration is
CONFIG_NETPRIO_CGROUP=n. The reason is that in a following patch the
net_prio_subsys_id will only be defined if CONFIG_NETPRIO_CGROUP!=n.
When net_prio is not built at all any callee should only get an empty
task_netprioidx() without any references to net_prio_subsys_id.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
---
 include/net/netprio_cgroup.h | 12 +++++-------
 net/core/sock.c              |  2 ++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
index 2719dec..b202de8 100644
--- a/include/net/netprio_cgroup.h
+++ b/include/net/netprio_cgroup.h
@@ -18,14 +18,13 @@
 #include <linux/rcupdate.h>
 
 
+#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
 struct netprio_map {
 	struct rcu_head rcu;
 	u32 priomap_len;
 	u32 priomap[];
 };
 
-#ifdef CONFIG_CGROUPS
-
 struct cgroup_netprio_state {
 	struct cgroup_subsys_state css;
 	u32 prioidx;
@@ -71,18 +70,17 @@ static inline u32 task_netprioidx(struct task_struct *p)
 	rcu_read_unlock();
 	return idx;
 }
+#endif
 
-#else
+#else /* !CONFIG_NETPRIO_CGROUP */
 
 static inline u32 task_netprioidx(struct task_struct *p)
 {
 	return 0;
 }
 
-#endif /* CONFIG_NETPRIO_CGROUP */
-
-#else
 #define sock_update_netprioidx(sk, task)
-#endif
+
+#endif /* CONFIG_NETPRIO_CGROUP */
 
 #endif  /* _NET_CLS_CGROUP_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 82cadc6..ca3eaee 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1237,6 +1237,7 @@ void sock_update_classid(struct sock *sk)
 EXPORT_SYMBOL(sock_update_classid);
 #endif
 
+#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
 void sock_update_netprioidx(struct sock *sk, struct task_struct *task)
 {
 	if (in_interrupt())
@@ -1246,6 +1247,7 @@ void sock_update_netprioidx(struct sock *sk, struct task_struct *task)
 }
 EXPORT_SYMBOL_GPL(sock_update_netprioidx);
 #endif
+#endif
 
 /**
  *	sk_alloc - All socket objects are allocated here
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* [PATCH v4 4/8] cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev, cgroups
  Cc: Daniel Wagner, Gao feng, Jamal Hadi Salim, John Fastabend,
	Li Zefan, Neil Horman
In-Reply-To: <1347459128-32236-1-git-send-email-wagi@monom.org>

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when
looping over the subsys array looking either at the builtin or the
module subsystems. Since all the builtin subsystems have an id which
is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will
have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids
are sorted.

We are about to change id assignment to happen only at compile time
later in this series. That means we can't rely on the above trick
since all ids will always be defined at compile time. Furthermore,
ordering the builtin subsystems and the module subsystems is not
really necessary.

So we need a different way to know which subsystem is a builtin or a
module one. We can use the subsys[]->module pointer for this. Any
place where we need to know if a subsys is module we just check for
the pointer. If it is NULL then the subsystem is a builtin one.

With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT
enum. Though we need to introduce a temporary placeholder so that we
don't get a compilation error when only CONFIG_CGROUP is selected and
no single controller. An empty enum definition is not valid. Later in
this series we are able to remove the placeholder again.

And with this change we get a fix for this:

kernel/cgroup.c: In function ‘cgroup_load_subsys’:
kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds]

when CONFIG_CGROUP=y and no built in controller was enabled.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
---
 include/linux/cgroup.h |  2 +-
 kernel/cgroup.c        | 75 +++++++++++++++++++++++++++++++-------------------
 2 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 145901f..1916cdb 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -48,7 +48,7 @@ extern const struct file_operations proc_cgroup_operations;
 #define SUBSYS(_x) _x ## _subsys_id,
 enum cgroup_subsys_id {
 #include <linux/cgroup_subsys.h>
-	CGROUP_BUILTIN_SUBSYS_COUNT
+	__CGROUP_TEMPORARY_PLACEHOLDER
 };
 #undef SUBSYS
 /*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ced292d..2726d82 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -88,7 +88,7 @@ static DEFINE_MUTEX(cgroup_root_mutex);
 
 /*
  * Generate an array of cgroup subsystem pointers. At boot time, this is
- * populated up to CGROUP_BUILTIN_SUBSYS_COUNT, and modular subsystems are
+ * populated with the built in subsystems, and modular subsystems are
  * registered after that. The mutable section of this array is protected by
  * cgroup_mutex.
  */
@@ -1321,11 +1321,13 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
 	 * take duplicate reference counts on a subsystem that's already used,
 	 * but rebind_subsystems handles this case.
 	 */
-	for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		unsigned long bit = 1UL << i;
 
 		if (!(bit & opts->subsys_mask))
 			continue;
+		if (!subsys[i]->module)
+			continue;
 		if (!try_module_get(subsys[i]->module)) {
 			module_pin_failed = true;
 			break;
@@ -1337,12 +1339,14 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
 		 * raced with a module_delete call, and to the user this is
 		 * essentially a "subsystem doesn't exist" case.
 		 */
-		for (i--; i >= CGROUP_BUILTIN_SUBSYS_COUNT; i--) {
+		for (i--; i >= 0; i--) {
 			/* drop refcounts only on the ones we took */
 			unsigned long bit = 1UL << i;
 
 			if (!(bit & opts->subsys_mask))
 				continue;
+			if (!subsys[i]->module)
+				continue;
 			module_put(subsys[i]->module);
 		}
 		return -ENOENT;
@@ -1354,11 +1358,13 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
 static void drop_parsed_module_refcounts(unsigned long subsys_mask)
 {
 	int i;
-	for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		unsigned long bit = 1UL << i;
 
 		if (!(bit & subsys_mask))
 			continue;
+		if (!subsys[i]->module)
+			continue;
 		module_put(subsys[i]->module);
 	}
 }
@@ -1437,6 +1443,7 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->event_list);
 	spin_lock_init(&cgrp->event_list_lock);
 	simple_xattrs_init(&cgrp->xattrs);
+	memset(cgrp->subsys, 0, sizeof(cgrp->subsys));
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -4442,8 +4449,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
 	 * since cgroup_init_subsys will have already taken care of it.
 	 */
 	if (ss->module == NULL) {
-		/* a few sanity checks */
-		BUG_ON(ss->subsys_id >= CGROUP_BUILTIN_SUBSYS_COUNT);
+		/* a sanity check */
 		BUG_ON(subsys[ss->subsys_id] != ss);
 		return 0;
 	}
@@ -4457,7 +4463,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
 	 */
 	mutex_lock(&cgroup_mutex);
 	/* find the first empty slot in the array */
-	for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		if (subsys[i] == NULL)
 			break;
 	}
@@ -4560,7 +4566,6 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
 
 	mutex_lock(&cgroup_mutex);
 	/* deassign the subsys_id */
-	BUG_ON(ss->subsys_id < CGROUP_BUILTIN_SUBSYS_COUNT);
 	subsys[ss->subsys_id] = NULL;
 
 	/* remove subsystem from rootnode's list of subsystems */
@@ -4623,10 +4628,13 @@ int __init cgroup_init_early(void)
 	for (i = 0; i < CSS_SET_TABLE_SIZE; i++)
 		INIT_HLIST_HEAD(&css_set_table[i]);
 
-	/* at bootup time, we don't worry about modular subsystems */
-	for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		struct cgroup_subsys *ss = subsys[i];
 
+		/* at bootup time, we don't worry about modular subsystems */
+		if (!ss || ss->module)
+			continue;
+
 		BUG_ON(!ss->name);
 		BUG_ON(strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN);
 		BUG_ON(!ss->create);
@@ -4659,9 +4667,12 @@ int __init cgroup_init(void)
 	if (err)
 		return err;
 
-	/* at bootup time, we don't worry about modular subsystems */
-	for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		struct cgroup_subsys *ss = subsys[i];
+
+		/* at bootup time, we don't worry about modular subsystems */
+		if (!ss || ss->module)
+			continue;
 		if (!ss->early_init)
 			cgroup_init_subsys(ss);
 		if (ss->use_id)
@@ -4856,13 +4867,16 @@ void cgroup_fork_callbacks(struct task_struct *child)
 {
 	if (need_forkexit_callback) {
 		int i;
-		/*
-		 * forkexit callbacks are only supported for builtin
-		 * subsystems, and the builtin section of the subsys array is
-		 * immutable, so we don't need to lock the subsys array here.
-		 */
-		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 			struct cgroup_subsys *ss = subsys[i];
+
+			/*
+			 * forkexit callbacks are only supported for
+			 * builtin subsystems.
+			 */
+			if (!ss || ss->module)
+				continue;
+
 			if (ss->fork)
 				ss->fork(child);
 		}
@@ -4967,12 +4981,13 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
 	tsk->cgroups = &init_css_set;
 
 	if (run_callbacks && need_forkexit_callback) {
-		/*
-		 * modular subsystems can't use callbacks, so no need to lock
-		 * the subsys array
-		 */
-		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 			struct cgroup_subsys *ss = subsys[i];
+
+			/* modular subsystems can't use callbacks */
+			if (!ss || ss->module)
+				continue;
+
 			if (ss->exit) {
 				struct cgroup *old_cgrp =
 					rcu_dereference_raw(cg->subsys[i])->cgroup;
@@ -5158,13 +5173,17 @@ static int __init cgroup_disable(char *str)
 	while ((token = strsep(&str, ",")) != NULL) {
 		if (!*token)
 			continue;
-		/*
-		 * cgroup_disable, being at boot time, can't know about module
-		 * subsystems, so we don't worry about them.
-		 */
-		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 			struct cgroup_subsys *ss = subsys[i];
 
+			/*
+			 * cgroup_disable, being at boot time, can't
+			 * know about module subsystems, so we don't
+			 * worry about them.
+			 */
+			if (!ss || ss->module)
+				continue;
+
 			if (!strcmp(token, ss->name)) {
 				ss->disabled = 1;
 				printk(KERN_INFO "Disabling %s control group"
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* [PATCH v4 2/8] cgroup: net_cls: Do not define task_cls_classid() when not selected
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev, cgroups
  Cc: Daniel Wagner, Gao feng, Jamal Hadi Salim, John Fastabend,
	Li Zefan, Neil Horman
In-Reply-To: <1347459128-32236-1-git-send-email-wagi@monom.org>

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

task_cls_classid() should not be defined in case the configuration is
CONFIG_NET_CLS_CGROUP=n. The reason is that in a following patch the
net_cls_subsys_id will only be defined if CONFIG_NET_CLS_CGROUP!=n.
When net_cls is not built at all a callee should only get an empty
task_cls_classid() without any references to net_cls_subsys_id.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
---
 include/net/cls_cgroup.h | 11 ++++++-----
 net/core/sock.c          |  2 ++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index e88527a..9bd5db9 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -17,7 +17,7 @@
 #include <linux/hardirq.h>
 #include <linux/rcupdate.h>
 
-#ifdef CONFIG_CGROUPS
+#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
 struct cgroup_cls_state
 {
 	struct cgroup_subsys_state css;
@@ -26,7 +26,7 @@ struct cgroup_cls_state
 
 extern void sock_update_classid(struct sock *sk);
 
-#ifdef CONFIG_NET_CLS_CGROUP
+#if IS_BUILTIN(CONFIG_NET_CLS_CGROUP)
 static inline u32 task_cls_classid(struct task_struct *p)
 {
 	int classid;
@@ -41,7 +41,8 @@ static inline u32 task_cls_classid(struct task_struct *p)
 
 	return classid;
 }
-#else
+#elif IS_MODULE(CONFIG_NET_CLS_CGROUP)
+
 extern int net_cls_subsys_id;
 
 static inline u32 task_cls_classid(struct task_struct *p)
@@ -63,7 +64,7 @@ static inline u32 task_cls_classid(struct task_struct *p)
 	return classid;
 }
 #endif
-#else
+#else /* !CGROUP_NET_CLS_CGROUP */
 static inline void sock_update_classid(struct sock *sk)
 {
 }
@@ -72,5 +73,5 @@ static inline u32 task_cls_classid(struct task_struct *p)
 {
 	return 0;
 }
-#endif
+#endif /* CGROUP_NET_CLS_CGROUP */
 #endif  /* _NET_CLS_CGROUP_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 8f67ced..82cadc6 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1223,6 +1223,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
 }
 
 #ifdef CONFIG_CGROUPS
+#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
 void sock_update_classid(struct sock *sk)
 {
 	u32 classid;
@@ -1234,6 +1235,7 @@ void sock_update_classid(struct sock *sk)
 		sk->sk_classid = classid;
 }
 EXPORT_SYMBOL(sock_update_classid);
+#endif
 
 void sock_update_netprioidx(struct sock *sk, struct task_struct *task)
 {
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* [PATCH v4 0/8] cgroup: Assign subsystem IDs during compile time
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev, cgroups
  Cc: Daniel Wagner, David S. Miller, Paul E. McKenney, Andrew Morton,
	Eric Dumazet, Gao feng, Glauber Costa, Herbert Xu,
	Jamal Hadi Salim, John Fastabend, Kamezawa Hiroyuki, Li Zefan,
	Neil Horman, Tejun Heo

From: Daniel Wagner <daniel.wagner@bmw-carit.de>

Hi,

I've removed the useless test in patch #4 and updated the commit message
on patch #7. 

While rewriting the commit message #7 I realized the pointer check was
completely wrong. Instead testing the return value of
task_subsys_state() I tested the pointer return by container_of. For
more details on this see the commit message. 

Because of this I added Herbert and Paul to the Cc list. Please have
close look at my rambling on the RCU part in patch #7. Thanks a lot!

This series is against 

     git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-3.7

cheers,
daniel

Previous cover letters:

v3:

In this version I tried to concentrate on the main topic of this
series, so I removed some of the things which were not really needed
and I have to admit the result looks much better. So I hope that will
simplify the review for you.

I reordered some of the patches and dropped the jump label
optimization for now. When this series is applied, then I can follow
up with those changes.

Overall, I tried to address all comments I got from v2. I didn't address
Tejun comment on 

  cgroup: Assign subsystem IDs during compile time

to split the net_cls and net_prio changes from that patch.  But I
tried to 'fix' this by beeing a bit more verbose.

The last patch is then the sweet one which gives some memory
back. 

v2:

Most notable changes are, that enabling/disabling of the jump labels
are not inside the cgroup_lock anymore (create/destroy cb). Instead
the corresponding functions will be called on module load or unload.

CGROUP_BUILTIN_SUBSYS_COUNT is also gone in this version.  This time I
trade space for speed. Some extra cycles are spend to identify the
modules in the for loops, e.g.

for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
	struct cgroup_subsys_state *ss = cgrp->subsys[i];

	/* at bootup time, we don't worry about modular subsystems */
	if (!ss || (ss && ss->module))
		continue;

	[...]
}

CGROUP_SUBSYS_COUNT is currently 12 if all controllers are built.  I
haven't found any other way to get rid of CGROUP_BUILTIN_SUBSYS_COUNT
without real dirty preprocessor tricks.

Finally, the two versions of task_cls_classid() and task_netprioidx()
are merged together.

v1:

I was able to 'fix' CGROUP_BUILTIN_SUBSYS_COUNT defition. With this
version there is no unused subsys_id. 

The number of builtin subsystem are counted with gcc's predefined
__COUNTER__ macro. This is a bit fragile, because __COUNTER__
is only reset to 0 per compile unit. There is a workaround for this.
When starting to enumate we need to store the current value of
__COUNTER__ and then subtract that from all enums we define. 

Not sure if that is okay or not.

v0:

The patch #1 and #2 are there to be able to introduce (#3, #4) the 
jump labels in task_cls_classid() and task_netprioidx(). The jump
labels are needed to know when it is safe to access the controller. 
For example not safe means the module is not yet loaded.

All those patches are just preparation for the center piece (#5) 
of these series. This one will remove the dynamic subsystem ID
generation and falls back to compile time generated IDs. 

This is the first result from the discussion around on the
"cgroup cls & netprio 'cleanups'" patches.

This patches are against net-next

v4: - removed unnecessary testing in patch #4
    - updated commit message in patch #7
    - fixed wrong pointer check in patch #7
v3: - dropping unrelated patches such as the jump label patch
    - reordered the patches
    - splitted "cgroup: Assign subsystem IDs during compile time" patch a bit
    - fixed the ordering dependency when assigning the subsystems
    - removed synchronize_rcu() calls
    - more verbose commit messages
v2: - do not use dirty precompiler tricks:
      use ss->module to identify modules in the loops.
    - enable/disable jump labels in module load/unload functions
    - merge builtin/module versions of task_cls_classid() and task_netprioidx
v1: - only use jump labels when built as module (#3, #4)
    - get rid of the additional 'pointer' (#5)
v0: - initial version

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org

Daniel Wagner (8):
  cgroup: net_cls: Move sock_update_classid() declaration to
    cls_cgroup.h
  cgroup: net_cls: Do not define task_cls_classid() when not selected
  cgroup: net_prio: Do not define task_netpioidx() when not selected
  cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
  cgroup: Wrap subsystem selection macro
  cgroup: Do not depend on a given order when populating the subsys
    array
  cgroup: Assign subsystem IDs during compile time
  cgroup: Define CGROUP_SUBSYS_COUNT according the configuration

 include/linux/cgroup.h        | 12 +++---
 include/linux/cgroup_subsys.h | 24 +++++------
 include/net/cls_cgroup.h      | 27 ++++++------
 include/net/netprio_cgroup.h  | 30 +++++--------
 include/net/sock.h            |  8 ----
 kernel/cgroup.c               | 98 ++++++++++++++++++++++---------------------
 net/core/netprio_cgroup.c     | 11 -----
 net/core/sock.c               | 15 ++-----
 net/sched/cls_cgroup.c        | 13 ------
 9 files changed, 97 insertions(+), 141 deletions(-)

-- 
1.7.12.315.g682ce8b

^ permalink raw reply

* [PATCH v4 5/8] cgroup: Wrap subsystem selection macro
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: Daniel Wagner, Gao feng, Jamal Hadi Salim, John Fastabend,
	Li Zefan, Neil Horman, Tejun Heo
In-Reply-To: <1347459128-32236-1-git-send-email-wagi-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

From: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>

Before we are able to define all subsystem ids at compile time we need
a more fine grained control what gets defined when we include
cgroup_subsys.h. For example we define the enums for the subsystems or
to declare for struct cgroup_subsys (builtin subsystem) by including
cgroup_subsys.h and defining SUBSYS accordingly.

Currently, the decision if a subsys is used is defined inside the
header by testing if CONFIG_*=y is true. By moving this test outside
of cgroup_subsys.h we are able to control it on the include level.

This is done by introducing IS_SUBSYS_ENABLED which then is defined
according the task, e.g. is CONFIG_*=y or CONFIG_*=m.

Signed-off-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
Cc: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: Jamal Hadi Salim <jhs-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
Cc: John Fastabend <john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 include/linux/cgroup.h        |  4 ++++
 include/linux/cgroup_subsys.h | 24 ++++++++++++------------
 kernel/cgroup.c               |  1 +
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 1916cdb..a5ab565 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -46,10 +46,12 @@ extern const struct file_operations proc_cgroup_operations;
 
 /* Define the enumeration of all builtin cgroup subsystems */
 #define SUBSYS(_x) _x ## _subsys_id,
+#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
 enum cgroup_subsys_id {
 #include <linux/cgroup_subsys.h>
 	__CGROUP_TEMPORARY_PLACEHOLDER
 };
+#undef IS_SUBSYS_ENABLED
 #undef SUBSYS
 /*
  * This define indicates the maximum number of subsystems that can be loaded
@@ -528,7 +530,9 @@ struct cgroup_subsys {
 };
 
 #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
+#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
 #include <linux/cgroup_subsys.h>
+#undef IS_SUBSYS_ENABLED
 #undef SUBSYS
 
 static inline struct cgroup_subsys_state *cgroup_subsys_state(
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index dfae957..f204a7a 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -7,73 +7,73 @@
 
 /* */
 
-#ifdef CONFIG_CPUSETS
+#if IS_SUBSYS_ENABLED(CONFIG_CPUSETS)
 SUBSYS(cpuset)
 #endif
 
 /* */
 
-#ifdef CONFIG_CGROUP_DEBUG
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEBUG)
 SUBSYS(debug)
 #endif
 
 /* */
 
-#ifdef CONFIG_CGROUP_SCHED
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_SCHED)
 SUBSYS(cpu_cgroup)
 #endif
 
 /* */
 
-#ifdef CONFIG_CGROUP_CPUACCT
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_CPUACCT)
 SUBSYS(cpuacct)
 #endif
 
 /* */
 
-#ifdef CONFIG_MEMCG
+#if IS_SUBSYS_ENABLED(CONFIG_MEMCG)
 SUBSYS(mem_cgroup)
 #endif
 
 /* */
 
-#ifdef CONFIG_CGROUP_DEVICE
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEVICE)
 SUBSYS(devices)
 #endif
 
 /* */
 
-#ifdef CONFIG_CGROUP_FREEZER
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_FREEZER)
 SUBSYS(freezer)
 #endif
 
 /* */
 
-#ifdef CONFIG_NET_CLS_CGROUP
+#if IS_SUBSYS_ENABLED(CONFIG_NET_CLS_CGROUP)
 SUBSYS(net_cls)
 #endif
 
 /* */
 
-#ifdef CONFIG_BLK_CGROUP
+#if IS_SUBSYS_ENABLED(CONFIG_BLK_CGROUP)
 SUBSYS(blkio)
 #endif
 
 /* */
 
-#ifdef CONFIG_CGROUP_PERF
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_PERF)
 SUBSYS(perf)
 #endif
 
 /* */
 
-#ifdef CONFIG_NETPRIO_CGROUP
+#if IS_SUBSYS_ENABLED(CONFIG_NETPRIO_CGROUP)
 SUBSYS(net_prio)
 #endif
 
 /* */
 
-#ifdef CONFIG_CGROUP_HUGETLB
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_HUGETLB)
 SUBSYS(hugetlb)
 #endif
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2726d82..769600c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -93,6 +93,7 @@ static DEFINE_MUTEX(cgroup_root_mutex);
  * cgroup_mutex.
  */
 #define SUBSYS(_x) &_x ## _subsys,
+#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
 static struct cgroup_subsys *subsys[CGROUP_SUBSYS_COUNT] = {
 #include <linux/cgroup_subsys.h>
 };
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* [PATCH v4 1/8] cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
From: Daniel Wagner @ 2012-09-12 14:12 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: Daniel Wagner, Gao feng, Jamal Hadi Salim, John Fastabend,
	Li Zefan, Neil Horman
In-Reply-To: <1347459128-32236-1-git-send-email-wagi-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>

From: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>

The only user of sock_update_classid() is net/socket.c which happens
to include cls_cgroup.h directly.

Signed-off-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: Jamal Hadi Salim <jhs-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
Cc: John Fastabend <john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 include/net/cls_cgroup.h | 6 ++++++
 include/net/sock.h       | 8 --------
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index a4dc5b0..e88527a 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -24,6 +24,8 @@ struct cgroup_cls_state
 	u32 classid;
 };
 
+extern void sock_update_classid(struct sock *sk);
+
 #ifdef CONFIG_NET_CLS_CGROUP
 static inline u32 task_cls_classid(struct task_struct *p)
 {
@@ -62,6 +64,10 @@ static inline u32 task_cls_classid(struct task_struct *p)
 }
 #endif
 #else
+static inline void sock_update_classid(struct sock *sk)
+{
+}
+
 static inline u32 task_cls_classid(struct task_struct *p)
 {
 	return 0;
diff --git a/include/net/sock.h b/include/net/sock.h
index 72132ae..160a680 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1486,14 +1486,6 @@ extern void *sock_kmalloc(struct sock *sk, int size,
 extern void sock_kfree_s(struct sock *sk, void *mem, int size);
 extern void sk_send_sigurg(struct sock *sk);
 
-#ifdef CONFIG_CGROUPS
-extern void sock_update_classid(struct sock *sk);
-#else
-static inline void sock_update_classid(struct sock *sk)
-{
-}
-#endif
-
 /*
  * Functions to fill in entries in struct proto_ops when a protocol
  * does not implement a particular function.
-- 
1.7.12.315.g682ce8b

^ permalink raw reply related

* Re: [PATCH] net_tx_action: Call trace_consume_skb() instead of trace_kfree_skb()
From: Eric Dumazet @ 2012-09-12 13:39 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev, sanagi.koki, davem
In-Reply-To: <20120912132055.GA2884@BohrerMBP.rgmadvisors.com>

On Wed, 2012-09-12 at 08:20 -0500, Shawn Bohrer wrote:

> But I guess your question is who puts the skb on the completion_queue.
> In my case it looks like:
> 
> dev_kfree_skb_irq()
> dev_kfree_skb_any()
> mlx4_en_free_tx_desc()
> mlx4_en_process_tx_cq()
> mlx4_en_xmit()
> dev_hard_start_xmit()
> # from here up the stack there seems to be several paths one of which is
> dev_queue_xmit()
> ip_finish_output()

> Does this answer your question about how I'm hitting this tracepoint?

Yes : this driver can do TX completion from its start_xmit() as well...


We need to add new helpers for dev_kfree_skb_any() and
dev_kfree_skb_irq(), but its quite a lot of work.

An alternative would be to set a bit in skb (skb->consumed)

^ permalink raw reply

* Re: [PATCH] net_tx_action: Call trace_consume_skb() instead of trace_kfree_skb()
From: Shawn Bohrer @ 2012-09-12 13:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, sanagi.koki, davem
In-Reply-To: <1347435199.13103.690.camel@edumazet-glaptop>

On Wed, Sep 12, 2012 at 09:33:19AM +0200, Eric Dumazet wrote:
> On Tue, 2012-09-11 at 18:28 -0500, Shawn Bohrer wrote:
> > Call trace_consume_skb() instead of trace_kfree_skb() as skbs are
> > removed from the completion_queue during transmit.  This avoids false
> > positives from dropwatch/drop_monitor making them more useful.
> > 
> > Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
> > ---
> > 
> > In my case I seem to hit this tracepoint for every packet I transmit so
> > these appear to be false positives to me.  Perhaps there are cases where
> > you could hit this and it is a real packet drop?
> > 
> >  net/core/dev.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8398836..00774ce 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3015,7 +3015,7 @@ static void net_tx_action(struct softirq_action *h)
> >  			clist = clist->next;
> >  
> >  			WARN_ON(atomic_read(&skb->users));
> > -			trace_kfree_skb(skb, net_tx_action);
> > +			trace_consume_skb(skb);
> >  			__kfree_skb(skb);
> >  		}
> >  	}
> > -- 
> > 1.7.7.6
> > 
> > 
> 
> 
> Problem here is : we dont know if caller of dev_kfree_skb_irq(skb)
> wanted to drop or consume skb.
> 
> (We dont have a dev_consume_skb_irq(skb) function)
> 
> For example, drivers/infiniband/ulp/ipoib/ipoib_main.c function
> path_free() does :
> 
> while ((skb = __skb_dequeue(&path->queue)))
> 	dev_kfree_skb_irq(skb);
> 
> Are these packets dropped or consumed, I dont really know...

Thanks Eric, that is what I was afraid of.

> Note : NAPI drivers dont use dev_kfree_skb_irq(skb).
> 
> What is the NIC driver you are using, I thought it was mellanox (wich is
> NAPI) ?

Yes this should be mlx4_en. For the record here are the call stacks to
the tracepoints I see:

    md_connector-7067  [010]   367.831826: kfree_skb:            skbaddr=0xffff8805d2d0c700 protocol=2048 location=0xffffffff813e3cc0
    md_connector-7067  [010]   367.831831: kernel_stack:         <stack trace>
=> __do_softirq (ffffffff8103e690)
=> call_softirq (ffffffff8149ca4c)
=> do_softirq (ffffffff81003ef5)
=> local_bh_enable (ffffffff8103e2c4)
=> dev_queue_xmit (ffffffff813e55f3)
=> ip_finish_output (ffffffff8141b2cb)
=> ip_output (ffffffff8141bda6)
=> ip_local_out (ffffffff8141b529)
=> ip_send_skb (ffffffff8141c80b)
=> udp_send_skb (ffffffff8143ec16)
=> udp_sendmsg (ffffffff8143fd61)
=> inet_sendmsg (ffffffff8144a614)
=> sock_sendmsg (ffffffff813cc647)
=> __sys_sendmsg (ffffffff813cde16)
=> sys_sendmsg (ffffffff813cfab9)
=> system_call_fastpath (ffffffff8149b712)

But I guess your question is who puts the skb on the completion_queue.
In my case it looks like:

dev_kfree_skb_irq()
dev_kfree_skb_any()
mlx4_en_free_tx_desc()
mlx4_en_process_tx_cq()
mlx4_en_xmit()
dev_hard_start_xmit()
# from here up the stack there seems to be several paths one of which is
dev_queue_xmit()
ip_finish_output()
ip_output()
ip_local_out()
ip_send_skb()
udp_send_skb()
udp_sendmsg()
inet_sendmsg()
sock_sendmsg()
__sys_sendmsg()
sys_sendmsg()

Does this answer your question about how I'm hitting this tracepoint?

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply

* Re: NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task )
From: Eric Dumazet @ 2012-09-12 13:05 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <k2q0ck$rpr$1@ger.gmane.org>

On Wed, 2012-09-12 at 12:49 +0000, Cong Wang wrote:
> On Wed, 12 Sep 2012 at 11:53 GMT, Sylvain Munaut <s.munaut@whatever-company.com> wrote:
> > Hi Eric,
> >
> > AFAICT, the following patch was never merged but I just confirmed that
> > I do need it over 3.6-rc5 for things to work properly.
> >
> 
> Yes, indeed.
> 
> Eric, please resend your patch?
> 

Yes, but I have some worries of why it is needed.

Isnt it covering a bug elsewhere ?

^ permalink raw reply

* Re: NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task )
From: Cong Wang @ 2012-09-12 12:49 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CAF6-1L5hKU-VM0E5vCwExJ4kb8YB=ndotgfH7pbpxO6Q7+JHUQ@mail.gmail.com>

On Wed, 12 Sep 2012 at 11:53 GMT, Sylvain Munaut <s.munaut@whatever-company.com> wrote:
> Hi Eric,
>
> AFAICT, the following patch was never merged but I just confirmed that
> I do need it over 3.6-rc5 for things to work properly.
>

Yes, indeed.

Eric, please resend your patch?

^ permalink raw reply

* [PATCH V2] netfilter/iptables: Fix log-level processing
From: Joe Perches @ 2012-09-12 12:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: auto75914331, netfilter, coreteam, netdev, bridge, linux-kernel,
	Bart De Schuymer, netfilter-devel, Stephen Hemminger,
	Patrick McHardy, Pablo Neira Ayuso
In-Reply-To: <1347437245.13103.697.camel@edumazet-glaptop>

auto75914331@hushmail.com reports that iptables does not correctly
output the KERN_<level>.

$IPTABLES -A RULE_0_in  -j LOG  --log-level notice --log-prefix "DENY  in: "

result with linux 3.6-rc5
Sep 12 06:37:29 xxxxx kernel: <5>DENY  in: IN=eth0 OUT= MAC=.......

result with linux 3.5.3 and older:
Sep  9 10:43:01 xxxxx kernel: DENY  in: IN=eth0 OUT= MAC......

commit 04d2c8c83d0
("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern")
updated the syslog header style but did not update netfilter uses.

Do so.

Signed-off-by: Joe Perches <joe@perches.com>
cc: auto75914331@hushmail.com
---
v2: Use KERN_SOH and string concatenation instead of "%c" KERN_SOH_ASCII
as suggested by Eric Dumazet.

 net/bridge/netfilter/ebt_log.c |    2 +-
 net/netfilter/xt_LOG.c         |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/netfilter/ebt_log.c b/net/bridge/netfilter/ebt_log.c
index f88ee53..92de5e5 100644
--- a/net/bridge/netfilter/ebt_log.c
+++ b/net/bridge/netfilter/ebt_log.c
@@ -80,7 +80,7 @@ ebt_log_packet(u_int8_t pf, unsigned int hooknum,
 	unsigned int bitmask;
 
 	spin_lock_bh(&ebt_log_lock);
-	printk("<%c>%s IN=%s OUT=%s MAC source = %pM MAC dest = %pM proto = 0x%04x",
+	printk(KERN_SOH "%c%s IN=%s OUT=%s MAC source = %pM MAC dest = %pM proto = 0x%04x",
 	       '0' + loginfo->u.log.level, prefix,
 	       in ? in->name : "", out ? out->name : "",
 	       eth_hdr(skb)->h_source, eth_hdr(skb)->h_dest,
diff --git a/net/netfilter/xt_LOG.c b/net/netfilter/xt_LOG.c
index ff5f75f..d1609dd 100644
--- a/net/netfilter/xt_LOG.c
+++ b/net/netfilter/xt_LOG.c
@@ -436,8 +436,8 @@ log_packet_common(struct sbuff *m,
 		  const struct nf_loginfo *loginfo,
 		  const char *prefix)
 {
-	sb_add(m, "<%d>%sIN=%s OUT=%s ", loginfo->u.log.level,
-	       prefix,
+	sb_add(m, KERN_SOH "%c%sIN=%s OUT=%s ",
+	       '0' + loginfo->u.log.level, prefix,
 	       in ? in->name : "",
 	       out ? out->name : "");
 #ifdef CONFIG_BRIDGE_NETFILTER

^ permalink raw reply related

* [PATCH net-next 2/2] ipv6: dont cache cloned routes
From: Eric Dumazet @ 2012-09-12 12:01 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Lorenzo Colitti, Maciej Żenczykowski, Tom Herbert

From: Eric Dumazet <edumazet@google.com>

We can now destroy cloned routes immediately from dst_release() instead
of depending on garbage collection.

Set DST_NOCACHE in rt6_alloc_clone() so that :

1) we avoid calling ip6_ins_rt() on such routes

2) dst_release() can call destroy when refcount becomes 0

This allows machines to resist to DDOS.

Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
---
 net/ipv6/route.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d4ba3fc..fedbb41 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -840,6 +840,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
 	struct rt6_info *rt = ip6_rt_copy(ort, daddr);
 
 	if (rt) {
+		rt->dst.flags |= DST_NOCACHE;
 		rt->rt6i_flags |= RTF_CACHE;
 		rt->n = neigh_clone(ort->n);
 	}
@@ -887,7 +888,7 @@ restart:
 
 	dst_hold(&rt->dst);
 	if (nrt) {
-		err = ip6_ins_rt(nrt);
+		err = (nrt->dst.flags & DST_NOCACHE) ? 0 : ip6_ins_rt(nrt);
 		if (!err)
 			goto out2;
 	}

^ permalink raw reply related

* [PATCH net-next 1/2] ipv6: force RTF_NONEXTHOP for SIT device
From: Eric Dumazet @ 2012-09-12 12:01 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Lorenzo Colitti, Maciej Żenczykowski, Tom Herbert

From: Eric Dumazet <edumazet@google.com>

We have special handling of SIT devices in addrconf_prefix_route()
to avoid using a neighbour for each destination.

If routing entry is :

ip -6 route add 2001:db8::/64 dev sit1

Then the kernel will create a new route for every new address
under 2001:db8::/64 that we send a packet to (potentially, 2^64
routes).

Under load, we immediately get the infamous "Neighbour table overflow"
message and machine eventually crash.

This does not happen if we specify a next-hop explicitly, like so:

ip -6 route add 2001:db8::/64 via fe80:: dev sit1

We can avoid this hassle doing the SIT test in ip6_route_add() instead
of addrconf_prefix_route().

This permits ip6_pol_route() to clone route instead of calling
rt6_alloc_cow() and allocate a neighbour

Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
---
 net/ipv6/addrconf.c |   10 ----------
 net/ipv6/route.c    |    9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1237d5d..c6837d2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1679,16 +1679,6 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
 	};
 
 	cfg.fc_dst = *pfx;
-
-	/* Prevent useless cloning on PtP SIT.
-	   This thing is done here expecting that the whole
-	   class of non-broadcast devices need not cloning.
-	 */
-#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
-	if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
-		cfg.fc_flags |= RTF_NONEXTHOP;
-#endif
-
 	ip6_route_add(&cfg);
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399613b..d4ba3fc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1540,6 +1540,15 @@ int ip6_route_add(struct fib6_config *cfg)
 	} else
 		rt->rt6i_prefsrc.plen = 0;
 
+	/* Prevent useless cloning on PtP SIT.
+	 *  This thing is done here expecting that the whole
+	 *  class of non-broadcast devices need not cloning.
+	 */
+#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
+	if (dev && dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
+		cfg->fc_flags |= RTF_NONEXTHOP;
+#endif
+
 	if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) {
 		err = rt6_bind_neighbour(rt, dev);
 		if (err)

^ permalink raw reply related

* Re: NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task )
From: Sylvain Munaut @ 2012-09-12 11:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Cong Wang
In-Reply-To: <1345640757.5158.1321.camel@edumazet-glaptop>

Hi Eric,

AFAICT, the following patch was never merged but I just confirmed that
I do need it over 3.6-rc5 for things to work properly.

>> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
>> > index 346b1eb..df731a0 100644
>> > --- a/net/core/netpoll.c
>> > +++ b/net/core/netpoll.c
>> > @@ -335,8 +335,11 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
>> >         /* don't get messages out of order, and no recursion */
>> >         if (skb_queue_len(&npinfo->txq) == 0 && !netpoll_owner_active(dev)) {
>> >                 struct netdev_queue *txq;
>> > +               int queue_index = skb_get_queue_mapping(skb);
>> >
>> > -               txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
>> > +               if (queue_index >= dev->real_num_tx_queues)
>> > +                       queue_index = 0;
>> > +               txq = netdev_get_tx_queue(dev, queue_index);
>> >
>> >                 /* try until next clock tick */
>> >                 for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;
>>
>>
>> Well, it doesn't solve the problem :(
>>
>> It does have an effect though. Now even on the machine with the
>> broadcom card, it just freeze the machine ...
>> On the machine with intel card, it actually does get a couple of
>> netconsole packet out and then freeze as well.
>>
>
> my patch was incomplete, sorry :
>
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 346b1eb..ddc453b 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -335,8 +335,13 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
>         /* don't get messages out of order, and no recursion */
>         if (skb_queue_len(&npinfo->txq) == 0 && !netpoll_owner_active(dev)) {
>                 struct netdev_queue *txq;
> +               int queue_index = skb_get_queue_mapping(skb);
>
> -               txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
> +               if (queue_index >= dev->real_num_tx_queues) {
> +                       queue_index = 0;
> +                       skb_set_queue_mapping(skb, 0);
> +               }
> +               txq = netdev_get_tx_queue(dev, queue_index);
>
>                 /* try until next clock tick */
>                 for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;

Cheers,

    Sylvain

^ permalink raw reply

* [V4 PATCH 6/8] csiostor: Chelsio FCoE offload driver submission (sources part 3).
From: Naresh Kumar Inna @ 2012-09-12 17:18 UTC (permalink / raw)
  To: JBottomley, linux-scsi, dm, leedom; +Cc: netdev, naresh, chethan
In-Reply-To: <1347470328-32490-1-git-send-email-naresh@chelsio.com>

This patch contains code to implement the local and remote node port
functionality. It includes tracking the firmware events for changes to
the states of these ports.

Signed-off-by: Naresh Kumar Inna <naresh@chelsio.com>
---
 drivers/scsi/csiostor/csio_lnode.c | 2148 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/csiostor/csio_rnode.c |  889 +++++++++++++++
 2 files changed, 3037 insertions(+), 0 deletions(-)
 create mode 100644 drivers/scsi/csiostor/csio_lnode.c
 create mode 100644 drivers/scsi/csiostor/csio_rnode.c

diff --git a/drivers/scsi/csiostor/csio_lnode.c b/drivers/scsi/csiostor/csio_lnode.c
new file mode 100644
index 0000000..24f38a2
--- /dev/null
+++ b/drivers/scsi/csiostor/csio_lnode.c
@@ -0,0 +1,2148 @@
+/*
+ * This file is part of the Chelsio FCoE driver for Linux.
+ *
+ * Copyright (c) 2008-2012 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/slab.h>
+#include <linux/utsname.h>
+#include <scsi/scsi_transport_fc.h>
+#include <asm/unaligned.h>
+#include <scsi/fc/fc_els.h>
+#include <scsi/fc/fc_fs.h>
+#include <scsi/fc/fc_gs.h>
+#include <scsi/fc/fc_ms.h>
+
+#include "csio_hw.h"
+#include "csio_mb.h"
+#include "csio_lnode.h"
+#include "csio_rnode.h"
+
+int csio_fcoe_rnodes = 512;
+int csio_fdmi_enable = 1;
+
+#define PORT_ID_PTR(_x)         ((uint8_t *)(&_x) + 1)
+
+/* Lnode SM declarations */
+static void csio_lns_uninit(struct csio_lnode *, enum csio_ln_ev);
+static void csio_lns_online(struct csio_lnode *, enum csio_ln_ev);
+static void csio_lns_ready(struct csio_lnode *, enum csio_ln_ev);
+static void csio_lns_offline(struct csio_lnode *, enum csio_ln_ev);
+
+static int csio_ln_mgmt_submit_req(struct csio_ioreq *,
+		void (*io_cbfn) (struct csio_hw *, struct csio_ioreq *),
+		enum fcoe_cmn_type, struct csio_dma_buf *, uint32_t);
+
+/* LN event mapping */
+static enum csio_ln_ev fwevt_to_lnevt[] = {
+	CSIO_LNE_NONE,		/* None */
+	CSIO_LNE_NONE,		/* PLOGI_ACC_RCVD  */
+	CSIO_LNE_NONE,		/* PLOGI_RJT_RCVD  */
+	CSIO_LNE_NONE,		/* PLOGI_RCVD	   */
+	CSIO_LNE_NONE,		/* PLOGO_RCVD	   */
+	CSIO_LNE_NONE,		/* PRLI_ACC_RCVD   */
+	CSIO_LNE_NONE,		/* PRLI_RJT_RCVD   */
+	CSIO_LNE_NONE,		/* PRLI_RCVD	   */
+	CSIO_LNE_NONE,		/* PRLO_RCVD	   */
+	CSIO_LNE_NONE,		/* NPORT_ID_CHGD   */
+	CSIO_LNE_LOGO,		/* FLOGO_RCVD	   */
+	CSIO_LNE_LOGO,		/* CLR_VIRT_LNK_RCVD */
+	CSIO_LNE_FAB_INIT_DONE,/* FLOGI_ACC_RCVD   */
+	CSIO_LNE_NONE,		/* FLOGI_RJT_RCVD   */
+	CSIO_LNE_FAB_INIT_DONE,/* FDISC_ACC_RCVD   */
+	CSIO_LNE_NONE,		/* FDISC_RJT_RCVD   */
+	CSIO_LNE_NONE,		/* FLOGI_TMO_MAX_RETRY */
+	CSIO_LNE_NONE,		/* IMPL_LOGO_ADISC_ACC */
+	CSIO_LNE_NONE,		/* IMPL_LOGO_ADISC_RJT */
+	CSIO_LNE_NONE,		/* IMPL_LOGO_ADISC_CNFLT */
+	CSIO_LNE_NONE,		/* PRLI_TMO		*/
+	CSIO_LNE_NONE,		/* ADISC_TMO		*/
+	CSIO_LNE_NONE,		/* RSCN_DEV_LOST */
+	CSIO_LNE_NONE,		/* SCR_ACC_RCVD */
+	CSIO_LNE_NONE,		/* ADISC_RJT_RCVD */
+	CSIO_LNE_NONE,		/* LOGO_SNT */
+	CSIO_LNE_NONE,		/* PROTO_ERR_IMPL_LOGO */
+};
+
+#define CSIO_FWE_TO_LNE(_evt)	((_evt > PROTO_ERR_IMPL_LOGO) ?		\
+						CSIO_LNE_NONE :	\
+						fwevt_to_lnevt[_evt])
+
+#define csio_ct_rsp(cp)		(((struct fc_ct_hdr *)cp)->ct_cmd)
+#define csio_ct_reason(cp)	(((struct fc_ct_hdr *)cp)->ct_reason)
+#define csio_ct_expl(cp)	(((struct fc_ct_hdr *)cp)->ct_explan)
+#define csio_ct_get_pld(cp)	((void *)(((uint8_t *)cp) + FC_CT_HDR_LEN))
+
+/*
+ * csio_ln_match_by_portid - lookup lnode using given portid.
+ * @hw: HW module
+ * @portid: port-id.
+ *
+ * If found, returns lnode matching given portid otherwise returns NULL.
+ */
+static struct csio_lnode *
+csio_ln_lookup_by_portid(struct csio_hw *hw, uint8_t portid)
+{
+	struct csio_lnode *ln = hw->rln;
+	struct list_head *tmp;
+
+	/* Match siblings lnode with portid */
+	list_for_each(tmp, &hw->sln_head) {
+		ln = (struct csio_lnode *) tmp;
+		if (ln->portid == portid)
+			return ln;
+	}
+
+	return NULL;
+}
+
+/*
+ * csio_ln_lookup_by_vnpi - Lookup lnode using given vnp id.
+ * @hw - HW module
+ * @vnpi - vnp index.
+ * Returns - If found, returns lnode matching given vnp id
+ * otherwise returns NULL.
+ */
+static struct csio_lnode *
+csio_ln_lookup_by_vnpi(struct csio_hw *hw, uint32_t vnp_id)
+{
+	struct list_head *tmp1, *tmp2;
+	struct csio_lnode *sln = NULL, *cln = NULL;
+
+	if (list_empty(&hw->sln_head)) {
+		CSIO_INC_STATS(hw, n_lnlkup_miss);
+		return NULL;
+	}
+	/* Traverse sibling lnodes */
+	list_for_each(tmp1, &hw->sln_head) {
+		sln = (struct csio_lnode *) tmp1;
+
+		/* Match sibling lnode */
+		if (sln->vnp_flowid == vnp_id)
+			return sln;
+
+		if (list_empty(&sln->cln_head))
+			continue;
+
+		/* Traverse children lnodes */
+		list_for_each(tmp2, &sln->cln_head) {
+			cln = (struct csio_lnode *) tmp2;
+
+			if (cln->vnp_flowid == vnp_id)
+				return cln;
+		}
+	}
+	CSIO_INC_STATS(hw, n_lnlkup_miss);
+	return NULL;
+}
+
+/**
+ * csio_lnode_lookup_by_wwpn - Lookup lnode using given wwpn.
+ * @hw:		HW module.
+ * @wwpn:	WWPN.
+ *
+ * If found, returns lnode matching given wwpn, returns NULL otherwise.
+ */
+struct csio_lnode *
+csio_lnode_lookup_by_wwpn(struct csio_hw *hw, uint8_t *wwpn)
+{
+	struct list_head *tmp1, *tmp2;
+	struct csio_lnode *sln = NULL, *cln = NULL;
+
+	if (list_empty(&hw->sln_head)) {
+		CSIO_INC_STATS(hw, n_lnlkup_miss);
+		return NULL;
+	}
+	/* Traverse sibling lnodes */
+	list_for_each(tmp1, &hw->sln_head) {
+		sln = (struct csio_lnode *) tmp1;
+
+		/* Match sibling lnode */
+		if (!memcmp(csio_ln_wwpn(sln), wwpn, 8))
+			return sln;
+
+		if (list_empty(&sln->cln_head))
+			continue;
+
+		/* Traverse children lnodes */
+		list_for_each(tmp2, &sln->cln_head) {
+			cln = (struct csio_lnode *) tmp2;
+
+			if (!memcmp(csio_ln_wwpn(cln), wwpn, 8))
+				return cln;
+		}
+	}
+	return NULL;
+}
+
+/* FDMI */
+static void
+csio_fill_ct_iu(void *buf, uint8_t type, uint8_t sub_type, uint16_t op)
+{
+	struct fc_ct_hdr *cmd = (struct fc_ct_hdr *)buf;
+	cmd->ct_rev = FC_CT_REV;
+	cmd->ct_fs_type = type;
+	cmd->ct_fs_subtype = sub_type;
+	cmd->ct_cmd = op;
+}
+
+static int
+csio_hostname(uint8_t *buf, size_t buf_len)
+{
+	if (sprintf(buf, "%s", init_utsname()->nodename))
+		return 0;
+	return -1;
+}
+
+static int
+csio_osname(uint8_t *buf, size_t buf_len)
+{
+	uint8_t *ptr = buf;
+
+	strcpy(ptr, init_utsname()->sysname);
+	ptr += strlen(init_utsname()->sysname);
+	*ptr = ' '; /* SPACE */
+	strcpy(ptr, init_utsname()->release);
+	ptr += strlen(init_utsname()->release);
+	*ptr = ' '; /* SPACE */
+	strcpy(ptr, init_utsname()->version);
+	ptr += strlen(init_utsname()->version);
+	*ptr = '\0';
+	return 0;
+}
+
+static inline void
+csio_append_attrib(uint8_t **ptr, uint16_t type, uint8_t *val, uint16_t len)
+{
+	struct fc_fdmi_attr_entry *ae = (struct fc_fdmi_attr_entry *)*ptr;
+	ae->type = htons(type);
+	len += 4;		/* includes attribute type and length */
+	len = (len + 3) & ~3;	/* should be multiple of 4 bytes */
+	ae->len = htons(len);
+	memset(ae->value, 0, len - 4);
+	memcpy(ae->value, val, len);
+	*ptr += len;
+}
+
+/*
+ * csio_ln_fdmi_done - FDMI registeration completion
+ * @hw: HW context
+ * @fdmi_req: fdmi request
+ */
+static void
+csio_ln_fdmi_done(struct csio_hw *hw, struct csio_ioreq *fdmi_req)
+{
+	void *cmd;
+	struct csio_lnode *ln = fdmi_req->lnode;
+
+	if (fdmi_req->wr_status != FW_SUCCESS) {
+		csio_ln_err(ln, "WR error:%x in processing fdmi rpa cmd\n",
+			    fdmi_req->wr_status);
+		CSIO_INC_STATS(ln, n_fdmi_err);
+	}
+
+	cmd = fdmi_req->dma_buf.vaddr;
+	if (ntohs(csio_ct_rsp(cmd)) != FC_FS_ACC) {
+		csio_ln_dbg(ln, "fdmi rpa cmd rejected reason %x expl %x\n",
+			    csio_ct_reason(cmd), csio_ct_expl(cmd));
+	}
+}
+
+/*
+ * csio_ln_fdmi_rhba_cbfn - RHBA completion
+ * @hw: HW context
+ * @fdmi_req: fdmi request
+ */
+static void
+csio_ln_fdmi_rhba_cbfn(struct csio_hw *hw, struct csio_ioreq *fdmi_req)
+{
+	void *cmd;
+	uint8_t *pld;
+	uint32_t len = 0;
+	struct csio_lnode *ln = fdmi_req->lnode;
+	struct fs_fdmi_attrs *attrib_blk;
+	struct fc_fdmi_port_name *port_name;
+	uint8_t buf[64];
+	uint32_t val;
+	uint8_t *fc4_type;
+
+	if (fdmi_req->wr_status != FW_SUCCESS) {
+		csio_ln_err(ln, "WR error:%x in processing fdmi rhba cmd\n",
+			    fdmi_req->wr_status);
+		CSIO_INC_STATS(ln, n_fdmi_err);
+	}
+
+	cmd = fdmi_req->dma_buf.vaddr;
+	if (ntohs(csio_ct_rsp(cmd)) != FC_FS_ACC) {
+		csio_ln_dbg(ln, "fdmi rhba cmd rejected reason %x expl %x\n",
+			    csio_ct_reason(cmd), csio_ct_expl(cmd));
+	}
+
+	if (!csio_is_rnode_ready(fdmi_req->rnode)) {
+		CSIO_INC_STATS(ln, n_fdmi_err);
+		return;
+	}
+
+	/* Prepare CT hdr for RPA cmd */
+	memset(cmd, 0, FC_CT_HDR_LEN);
+	csio_fill_ct_iu(cmd, FC_FST_MGMT, FC_FDMI_SUBTYPE, htons(FC_FDMI_RPA));
+
+	/* Prepare RPA payload */
+	pld = (uint8_t *)csio_ct_get_pld(cmd);
+	port_name = (struct fc_fdmi_port_name *)pld;
+	memcpy(&port_name->portname, csio_ln_wwpn(ln), 8);
+	pld += sizeof(*port_name);
+
+	/* Start appending Port attributes */
+	attrib_blk = (struct fs_fdmi_attrs *)pld;
+	attrib_blk->numattrs = 0;
+	len += sizeof(attrib_blk->numattrs);
+	pld += sizeof(attrib_blk->numattrs);
+
+	fc4_type = &buf[0];
+	memset(fc4_type, 0, FC_FDMI_PORT_ATTR_FC4TYPES_LEN);
+	fc4_type[2] = 1;
+	fc4_type[7] = 1;
+	csio_append_attrib(&pld, FC_FDMI_PORT_ATTR_FC4TYPES,
+			   fc4_type, FC_FDMI_PORT_ATTR_FC4TYPES_LEN);
+	attrib_blk->numattrs++;
+	val = htonl(FC_PORTSPEED_1GBIT | FC_PORTSPEED_10GBIT);
+	csio_append_attrib(&pld, FC_FDMI_PORT_ATTR_SUPPORTEDSPEED,
+			   (uint8_t *)&val,
+			   FC_FDMI_PORT_ATTR_SUPPORTEDSPEED_LEN);
+	attrib_blk->numattrs++;
+
+	if (hw->pport[ln->portid].link_speed == FW_PORT_CAP_SPEED_1G)
+		val = htonl(FC_PORTSPEED_1GBIT);
+	else if (hw->pport[ln->portid].link_speed == FW_PORT_CAP_SPEED_10G)
+		val = htonl(FC_PORTSPEED_10GBIT);
+	else
+		val = htonl(CSIO_HBA_PORTSPEED_UNKNOWN);
+	csio_append_attrib(&pld, FC_FDMI_PORT_ATTR_CURRENTPORTSPEED,
+			   (uint8_t *)&val,
+			   FC_FDMI_PORT_ATTR_CURRENTPORTSPEED_LEN);
+	attrib_blk->numattrs++;
+
+	val = htonl(ln->ln_sparm.csp.sp_bb_data);
+	csio_append_attrib(&pld, FC_FDMI_PORT_ATTR_MAXFRAMESIZE,
+			   (uint8_t *)&val, FC_FDMI_PORT_ATTR_MAXFRAMESIZE_LEN);
+	attrib_blk->numattrs++;
+
+	strcpy(buf, "csiostor");
+	csio_append_attrib(&pld, FC_FDMI_PORT_ATTR_OSDEVICENAME, buf,
+			   (uint16_t)strlen(buf));
+	attrib_blk->numattrs++;
+
+	if (!csio_hostname(buf, sizeof(buf))) {
+		csio_append_attrib(&pld, FC_FDMI_PORT_ATTR_HOSTNAME,
+				   buf, (uint16_t)strlen(buf));
+		attrib_blk->numattrs++;
+	}
+	attrib_blk->numattrs = ntohl(attrib_blk->numattrs);
+	len = (uint32_t)(pld - (uint8_t *)cmd);
+
+	/* Submit FDMI RPA request */
+	spin_lock_irq(&hw->lock);
+	if (csio_ln_mgmt_submit_req(fdmi_req, csio_ln_fdmi_done,
+				FCOE_CT, &fdmi_req->dma_buf, len)) {
+		CSIO_INC_STATS(ln, n_fdmi_err);
+		csio_ln_err(ln, "Failed to issue fdmi rpa req\n");
+	}
+	spin_unlock_irq(&hw->lock);
+}
+
+/*
+ * csio_ln_fdmi_dprt_cbfn - DPRT completion
+ * @hw: HW context
+ * @fdmi_req: fdmi request
+ */
+static void
+csio_ln_fdmi_dprt_cbfn(struct csio_hw *hw, struct csio_ioreq *fdmi_req)
+{
+	void *cmd;
+	uint8_t *pld;
+	uint32_t len = 0;
+	uint32_t maxpayload = htonl(65536);
+	struct fc_fdmi_hba_identifier *hbaid;
+	struct csio_lnode *ln = fdmi_req->lnode;
+	struct fc_fdmi_rpl *reg_pl;
+	struct fs_fdmi_attrs *attrib_blk;
+	uint8_t buf[64];
+
+	if (fdmi_req->wr_status != FW_SUCCESS) {
+		csio_ln_err(ln, "WR error:%x in processing fdmi dprt cmd\n",
+			    fdmi_req->wr_status);
+		CSIO_INC_STATS(ln, n_fdmi_err);
+	}
+
+	if (!csio_is_rnode_ready(fdmi_req->rnode)) {
+		CSIO_INC_STATS(ln, n_fdmi_err);
+		return;
+	}
+	cmd = fdmi_req->dma_buf.vaddr;
+	if (ntohs(csio_ct_rsp(cmd)) != FC_FS_ACC) {
+		csio_ln_dbg(ln, "fdmi dprt cmd rejected reason %x expl %x\n",
+			    csio_ct_reason(cmd), csio_ct_expl(cmd));
+	}
+
+	/* Prepare CT hdr for RHBA cmd */
+	memset(cmd, 0, FC_CT_HDR_LEN);
+	csio_fill_ct_iu(cmd, FC_FST_MGMT, FC_FDMI_SUBTYPE, htons(FC_FDMI_RHBA));
+	len = FC_CT_HDR_LEN;
+
+	/* Prepare RHBA payload */
+	pld = (uint8_t *)csio_ct_get_pld(cmd);
+	hbaid = (struct fc_fdmi_hba_identifier *)pld;
+	memcpy(&hbaid->id, csio_ln_wwpn(ln), 8); /* HBA identifer */
+	pld += sizeof(*hbaid);
+
+	/* Register one port per hba */
+	reg_pl = (struct fc_fdmi_rpl *)pld;
+	reg_pl->numport = ntohl(1);
+	memcpy(&reg_pl->port[0].portname, csio_ln_wwpn(ln), 8);
+	pld += sizeof(*reg_pl);
+
+	/* Start appending HBA attributes hba */
+	attrib_blk = (struct fs_fdmi_attrs *)pld;
+	attrib_blk->numattrs = 0;
+	len += sizeof(attrib_blk->numattrs);
+	pld += sizeof(attrib_blk->numattrs);
+
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_NODENAME, csio_ln_wwnn(ln),
+			   FC_FDMI_HBA_ATTR_NODENAME_LEN);
+	attrib_blk->numattrs++;
+
+	memset(buf, 0, sizeof(buf));
+
+	strcpy(buf, "Chelsio Communications");
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_MANUFACTURER, buf,
+			   (uint16_t)strlen(buf));
+	attrib_blk->numattrs++;
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_SERIALNUMBER,
+			   hw->vpd.sn, (uint16_t)sizeof(hw->vpd.sn));
+	attrib_blk->numattrs++;
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_MODEL, hw->vpd.id,
+			   (uint16_t)sizeof(hw->vpd.id));
+	attrib_blk->numattrs++;
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_MODELDESCRIPTION,
+			   hw->model_desc, (uint16_t)strlen(hw->model_desc));
+	attrib_blk->numattrs++;
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_HARDWAREVERSION,
+			   hw->hw_ver, (uint16_t)sizeof(hw->hw_ver));
+	attrib_blk->numattrs++;
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_FIRMWAREVERSION,
+			   hw->fwrev_str, (uint16_t)strlen(hw->fwrev_str));
+	attrib_blk->numattrs++;
+
+	if (!csio_osname(buf, sizeof(buf))) {
+		csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_OSNAMEVERSION,
+				   buf, (uint16_t)strlen(buf));
+		attrib_blk->numattrs++;
+	}
+
+	csio_append_attrib(&pld, FC_FDMI_HBA_ATTR_MAXCTPAYLOAD,
+			   (uint8_t *)&maxpayload,
+			   FC_FDMI_HBA_ATTR_MAXCTPAYLOAD_LEN);
+	len = (uint32_t)(pld - (uint8_t *)cmd);
+	attrib_blk->numattrs++;
+	attrib_blk->numattrs = ntohl(attrib_blk->numattrs);
+
+	/* Submit FDMI RHBA request */
+	spin_lock_irq(&hw->lock);
+	if (csio_ln_mgmt_submit_req(fdmi_req, csio_ln_fdmi_rhba_cbfn,
+				FCOE_CT, &fdmi_req->dma_buf, len)) {
+		CSIO_INC_STATS(ln, n_fdmi_err);
+		csio_ln_err(ln, "Failed to issue fdmi rhba req\n");
+	}
+	spin_unlock_irq(&hw->lock);
+}
+
+/*
+ * csio_ln_fdmi_dhba_cbfn - DHBA completion
+ * @hw: HW context
+ * @fdmi_req: fdmi request
+ */
+static void
+csio_ln_fdmi_dhba_cbfn(struct csio_hw *hw, struct csio_ioreq *fdmi_req)
+{
+	struct csio_lnode *ln = fdmi_req->lnode;
+	void *cmd;
+	struct fc_fdmi_port_name *port_name;
+	uint32_t len;
+
+	if (fdmi_req->wr_status != FW_SUCCESS) {
+		csio_ln_err(ln, "WR error:%x in processing fdmi dhba cmd\n",
+			    fdmi_req->wr_status);
+		CSIO_INC_STATS(ln, n_fdmi_err);
+	}
+
+	if (!csio_is_rnode_ready(fdmi_req->rnode)) {
+		CSIO_INC_STATS(ln, n_fdmi_err);
+		return;
+	}
+	cmd = fdmi_req->dma_buf.vaddr;
+	if (ntohs(csio_ct_rsp(cmd)) != FC_FS_ACC) {
+		csio_ln_dbg(ln, "fdmi dhba cmd rejected reason %x expl %x\n",
+			    csio_ct_reason(cmd), csio_ct_expl(cmd));
+	}
+
+	/* Send FDMI cmd to de-register any Port attributes if registered
+	 * before
+	 */
+
+	/* Prepare FDMI DPRT cmd */
+	memset(cmd, 0, FC_CT_HDR_LEN);
+	csio_fill_ct_iu(cmd, FC_FST_MGMT, FC_FDMI_SUBTYPE, htons(FC_FDMI_DPRT));
+	len = FC_CT_HDR_LEN;
+	port_name = (struct fc_fdmi_port_name *)csio_ct_get_pld(cmd);
+	memcpy(&port_name->portname, csio_ln_wwpn(ln), 8);
+	len += sizeof(*port_name);
+
+	/* Submit FDMI request */
+	spin_lock_irq(&hw->lock);
+	if (csio_ln_mgmt_submit_req(fdmi_req, csio_ln_fdmi_dprt_cbfn,
+				FCOE_CT, &fdmi_req->dma_buf, len)) {
+		CSIO_INC_STATS(ln, n_fdmi_err);
+		csio_ln_err(ln, "Failed to issue fdmi dprt req\n");
+	}
+	spin_unlock_irq(&hw->lock);
+}
+
+/**
+ * csio_ln_fdmi_start - Start an FDMI request.
+ * @ln:		lnode
+ * @context:	session context
+ *
+ * Issued with lock held.
+ */
+int
+csio_ln_fdmi_start(struct csio_lnode *ln, void *context)
+{
+	struct csio_ioreq *fdmi_req;
+	struct csio_rnode *fdmi_rn = (struct csio_rnode *)context;
+	void *cmd;
+	struct fc_fdmi_hba_identifier *hbaid;
+	uint32_t len;
+
+	if (!(ln->flags & CSIO_LNF_FDMI_ENABLE))
+		return -EPROTONOSUPPORT;
+
+	if (!csio_is_rnode_ready(fdmi_rn))
+		CSIO_INC_STATS(ln, n_fdmi_err);
+
+	/* Send FDMI cmd to de-register any HBA attributes if registered
+	 * before
+	 */
+
+	fdmi_req = ln->mgmt_req;
+	fdmi_req->lnode = ln;
+	fdmi_req->rnode = fdmi_rn;
+
+	/* Prepare FDMI DHBA cmd */
+	cmd = fdmi_req->dma_buf.vaddr;
+	memset(cmd, 0, FC_CT_HDR_LEN);
+	csio_fill_ct_iu(cmd, FC_FST_MGMT, FC_FDMI_SUBTYPE, htons(FC_FDMI_DHBA));
+	len = FC_CT_HDR_LEN;
+
+	hbaid = (struct fc_fdmi_hba_identifier *)csio_ct_get_pld(cmd);
+	memcpy(&hbaid->id, csio_ln_wwpn(ln), 8);
+	len += sizeof(*hbaid);
+
+	/* Submit FDMI request */
+	if (csio_ln_mgmt_submit_req(fdmi_req, csio_ln_fdmi_dhba_cbfn,
+					FCOE_CT, &fdmi_req->dma_buf, len)) {
+		CSIO_INC_STATS(ln, n_fdmi_err);
+		csio_ln_err(ln, "Failed to issue fdmi dhba req\n");
+	}
+
+	return 0;
+}
+
+/*
+ * csio_ln_vnp_read_cbfn - vnp read completion handler.
+ * @hw: HW lnode
+ * @cbfn: Completion handler.
+ *
+ * Reads vnp response and updates ln parameters.
+ */
+static void
+csio_ln_vnp_read_cbfn(struct csio_hw *hw, struct csio_mb *mbp)
+{
+	struct csio_lnode *ln = ((struct csio_lnode *)mbp->priv);
+	struct fw_fcoe_vnp_cmd *rsp = (struct fw_fcoe_vnp_cmd *)(mbp->mb);
+	struct fc_els_csp *csp;
+	struct fc_els_cssp *clsp;
+	enum fw_retval retval;
+
+	spin_lock_irq(&hw->lock);
+
+	retval = FW_CMD_RETVAL_GET(ntohl(rsp->alloc_to_len16));
+	if (retval != FW_SUCCESS) {
+		csio_err(hw, "FCOE VNP read cmd returned error:0x%x\n", retval);
+		spin_unlock_irq(&hw->lock);
+		mempool_free(mbp, hw->mb_mempool);
+		return;
+	}
+
+	memcpy(ln->mac, rsp->vnport_mac, sizeof(ln->mac));
+	memcpy(&ln->nport_id, &rsp->vnport_mac[3],
+			sizeof(uint8_t)*3);
+	ln->nport_id = ntohl(ln->nport_id);
+	ln->nport_id = ln->nport_id>>8;
+
+	/* Update WWNs */
+	/*
+	 * This may look like a duplication of what csio_fcoe_enable_link()
+	 * does, but is absolutely necessary if the vnpi changes between
+	 * a FCOE LINK UP and FCOE LINK DOWN.
+	 */
+	memcpy(csio_ln_wwnn(ln), rsp->vnport_wwnn, 8);
+	memcpy(csio_ln_wwpn(ln), rsp->vnport_wwpn, 8);
+
+	/* Copy common sparam */
+	csp = (struct fc_els_csp *)rsp->cmn_srv_parms;
+	ln->ln_sparm.csp.sp_hi_ver = csp->sp_hi_ver;
+	ln->ln_sparm.csp.sp_lo_ver = csp->sp_lo_ver;
+	ln->ln_sparm.csp.sp_bb_cred = ntohs(csp->sp_bb_cred);
+	ln->ln_sparm.csp.sp_features = ntohs(csp->sp_features);
+	ln->ln_sparm.csp.sp_bb_data = ntohs(csp->sp_bb_data);
+	ln->ln_sparm.csp.sp_r_a_tov = ntohl(csp->sp_r_a_tov);
+	ln->ln_sparm.csp.sp_e_d_tov = ntohl(csp->sp_e_d_tov);
+
+	/* Copy word 0 & word 1 of class sparam */
+	clsp = (struct fc_els_cssp *)rsp->clsp_word_0_1;
+	ln->ln_sparm.clsp[2].cp_class = ntohs(clsp->cp_class);
+	ln->ln_sparm.clsp[2].cp_init = ntohs(clsp->cp_init);
+	ln->ln_sparm.clsp[2].cp_recip = ntohs(clsp->cp_recip);
+	ln->ln_sparm.clsp[2].cp_rdfs = ntohs(clsp->cp_rdfs);
+
+	spin_unlock_irq(&hw->lock);
+
+	mempool_free(mbp, hw->mb_mempool);
+
+	/* Send an event to update local attribs */
+	csio_lnode_async_event(ln, CSIO_LN_FC_ATTRIB_UPDATE);
+}
+
+/*
+ * csio_ln_vnp_read - Read vnp params.
+ * @ln: lnode
+ * @cbfn: Completion handler.
+ *
+ * Issued with lock held.
+ */
+static int
+csio_ln_vnp_read(struct csio_lnode *ln,
+		void (*cbfn) (struct csio_hw *, struct csio_mb *))
+{
+	struct csio_hw *hw = ln->hwp;
+	struct csio_mb  *mbp;
+
+	/* Allocate Mbox request */
+	mbp = mempool_alloc(hw->mb_mempool, GFP_ATOMIC);
+	if (!mbp) {
+		CSIO_INC_STATS(hw, n_err_nomem);
+		return -ENOMEM;
+	}
+
+	/* Prepare VNP Command */
+	csio_fcoe_vnp_read_init_mb(ln, mbp,
+				    CSIO_MB_DEFAULT_TMO,
+				    ln->fcf_flowid,
+				    ln->vnp_flowid,
+				    cbfn);
+
+	/* Issue MBOX cmd */
+	if (csio_mb_issue(hw, mbp)) {
+		csio_err(hw, "Failed to issue mbox FCoE VNP command\n");
+		mempool_free(mbp, hw->mb_mempool);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * csio_fcoe_enable_link - Enable fcoe link.
+ * @ln: lnode
+ * @enable: enable/disable
+ * Issued with lock held.
+ * Issues mbox cmd to bring up FCOE link on port associated with given ln.
+ */
+static int
+csio_fcoe_enable_link(struct csio_lnode *ln, bool enable)
+{
+	struct csio_hw *hw = ln->hwp;
+	struct csio_mb  *mbp;
+	enum fw_retval retval;
+	uint8_t portid;
+	uint8_t sub_op;
+	struct fw_fcoe_link_cmd *lcmd;
+	int i;
+
+	mbp = mempool_alloc(hw->mb_mempool, GFP_ATOMIC);
+	if (!mbp) {
+		CSIO_INC_STATS(hw, n_err_nomem);
+		return -ENOMEM;
+	}
+
+	portid = ln->portid;
+	sub_op = enable ? FCOE_LINK_UP : FCOE_LINK_DOWN;
+
+	csio_dbg(hw, "bringing FCOE LINK %s on Port:%d\n",
+		 sub_op ? "UP" : "DOWN", portid);
+
+	csio_write_fcoe_link_cond_init_mb(ln, mbp, CSIO_MB_DEFAULT_TMO,
+					  portid, sub_op, 0, 0, 0, NULL);
+
+	if (csio_mb_issue(hw, mbp)) {
+		csio_err(hw, "failed to issue FCOE LINK cmd on port[%d]\n",
+			portid);
+		mempool_free(mbp, hw->mb_mempool);
+		return -EINVAL;
+	}
+
+	retval = csio_mb_fw_retval(mbp);
+	if (retval != FW_SUCCESS) {
+		csio_err(hw,
+			 "FCOE LINK %s cmd on port[%d] failed with "
+			 "ret:x%x\n", sub_op ? "UP" : "DOWN", portid, retval);
+		mempool_free(mbp, hw->mb_mempool);
+		return -EINVAL;
+	}
+
+	if (!enable)
+		goto out;
+
+	lcmd = (struct fw_fcoe_link_cmd *)mbp->mb;
+
+	memcpy(csio_ln_wwnn(ln), lcmd->vnport_wwnn, 8);
+	memcpy(csio_ln_wwpn(ln), lcmd->vnport_wwpn, 8);
+
+	for (i = 0; i < CSIO_MAX_PPORTS; i++)
+		if (hw->pport[i].portid == portid)
+			memcpy(hw->pport[i].mac, lcmd->phy_mac, 6);
+
+out:
+	mempool_free(mbp, hw->mb_mempool);
+	return 0;
+}
+
+/*
+ * csio_ln_read_fcf_cbfn - Read fcf parameters
+ * @ln: lnode
+ *
+ * read fcf response and Update ln fcf information.
+ */
+static void
+csio_ln_read_fcf_cbfn(struct csio_hw *hw, struct csio_mb *mbp)
+{
+	struct csio_lnode *ln = (struct csio_lnode *)mbp->priv;
+	struct csio_fcf_info	*fcf_info;
+	struct fw_fcoe_fcf_cmd *rsp =
+				(struct fw_fcoe_fcf_cmd *)(mbp->mb);
+	enum fw_retval retval;
+
+	spin_lock_irq(&hw->lock);
+
+	retval = FW_CMD_RETVAL_GET(ntohl(rsp->retval_len16));
+	if (retval != FW_SUCCESS) {
+		csio_ln_err(ln, "FCOE FCF cmd failed with ret x%x\n",
+				retval);
+		mempool_free(mbp, hw->mb_mempool);
+		spin_unlock_irq(&hw->lock);
+		return;
+	}
+
+	fcf_info = ln->fcfinfo;
+	fcf_info->priority = FW_FCOE_FCF_CMD_PRIORITY_GET(
+					ntohs(rsp->priority_pkd));
+	fcf_info->vf_id = ntohs(rsp->vf_id);
+	fcf_info->vlan_id = rsp->vlan_id;
+	fcf_info->max_fcoe_size = ntohs(rsp->max_fcoe_size);
+	fcf_info->fka_adv = be32_to_cpu(rsp->fka_adv);
+	fcf_info->fcfi = FW_FCOE_FCF_CMD_FCFI_GET(ntohl(rsp->op_to_fcfi));
+	fcf_info->fpma = FW_FCOE_FCF_CMD_FPMA_GET(rsp->fpma_to_portid);
+	fcf_info->spma = FW_FCOE_FCF_CMD_SPMA_GET(rsp->fpma_to_portid);
+	fcf_info->login = FW_FCOE_FCF_CMD_LOGIN_GET(rsp->fpma_to_portid);
+	fcf_info->portid = FW_FCOE_FCF_CMD_PORTID_GET(rsp->fpma_to_portid);
+	memcpy(fcf_info->fc_map, rsp->fc_map, sizeof(fcf_info->fc_map));
+	memcpy(fcf_info->mac, rsp->mac, sizeof(fcf_info->mac));
+	memcpy(fcf_info->name_id, rsp->name_id, sizeof(fcf_info->name_id));
+	memcpy(fcf_info->fabric, rsp->fabric, sizeof(fcf_info->fabric));
+	memcpy(fcf_info->spma_mac, rsp->spma_mac, sizeof(fcf_info->spma_mac));
+
+	spin_unlock_irq(&hw->lock);
+	mempool_free(mbp, hw->mb_mempool);
+
+}
+
+/*
+ * csio_ln_read_fcf_entry - Read fcf entry.
+ * @ln: lnode
+ * @cbfn: Completion handler.
+ *
+ * Issued with lock held.
+ */
+static int
+csio_ln_read_fcf_entry(struct csio_lnode *ln,
+			void (*cbfn) (struct csio_hw *, struct csio_mb *))
+{
+	struct csio_hw *hw = ln->hwp;
+	struct csio_mb  *mbp;
+
+	mbp = mempool_alloc(hw->mb_mempool, GFP_ATOMIC);
+	if (!mbp) {
+		CSIO_INC_STATS(hw, n_err_nomem);
+		return -ENOMEM;
+	}
+
+	/* Get FCoE FCF information */
+	csio_fcoe_read_fcf_init_mb(ln, mbp, CSIO_MB_DEFAULT_TMO,
+				      ln->portid, ln->fcf_flowid, cbfn);
+
+	if (csio_mb_issue(hw, mbp)) {
+		csio_err(hw, "failed to issue FCOE FCF cmd\n");
+		mempool_free(mbp, hw->mb_mempool);
+		return -EINVAL;
+	}
+
+	if (cbfn == NULL) {
+		spin_unlock_irq(&hw->lock);
+		csio_ln_read_fcf_cbfn(hw, mbp);
+		spin_lock_irq(&hw->lock);
+	}
+
+	return 0;
+}
+
+/*
+ * csio_handle_link_up - Logical Linkup event.
+ * @hw - HW module.
+ * @portid - Physical port number
+ * @fcfi - FCF index.
+ * @vnpi - VNP index.
+ * Returns - none.
+ *
+ * This event is received from FW, when virtual link is established between
+ * Physical port[ENode] and FCF. If its new vnpi, then local node object is
+ * created on this FCF and set to [ONLINE] state.
+ * Lnode waits for FW_RDEV_CMD event to be received indicating that
+ * Fabric login is completed and lnode moves to [READY] state.
+ *
+ * This called with hw lock held
+ */
+static void
+csio_handle_link_up(struct csio_hw *hw, uint8_t portid, uint32_t fcfi,
+		    uint32_t vnpi)
+{
+	struct csio_lnode *ln = NULL;
+
+	/* Lookup lnode based on vnpi */
+	ln = csio_ln_lookup_by_vnpi(hw, vnpi);
+	if (!ln) {
+		/* Pick lnode based on portid */
+		ln = csio_ln_lookup_by_portid(hw, portid);
+		if (!ln) {
+			csio_err(hw, "failed to lookup fcoe lnode on port:%d\n",
+				portid);
+			CSIO_DB_ASSERT(0);
+			return;
+		}
+
+		/* Check if lnode has valid vnp flowid */
+		if (ln->vnp_flowid != CSIO_INVALID_IDX) {
+			/* New VN-Port */
+			spin_unlock_irq(&hw->lock);
+			csio_lnode_alloc(hw);
+			spin_lock_irq(&hw->lock);
+			if (!ln) {
+				csio_err(hw,
+					 "failed to allocate fcoe lnode"
+					 "for port:%d vnpi:x%x\n",
+					 portid, vnpi);
+				CSIO_DB_ASSERT(0);
+				return;
+			}
+			ln->portid = portid;
+		}
+		ln->vnp_flowid = vnpi;
+		ln->dev_num &= ~0xFFFF;
+		ln->dev_num |= vnpi;
+	}
+
+	/*Initialize fcfi */
+	ln->fcf_flowid = fcfi;
+
+	csio_info(hw, "Port:%d - FCOE LINK UP\n", portid);
+
+	CSIO_INC_STATS(ln, n_link_up);
+
+	/* Send LINKUP event to SM */
+	csio_post_event(&ln->sm, CSIO_LNE_LINKUP);
+}
+
+/*
+ * csio_post_event_rns
+ * @ln - FCOE lnode
+ * @evt - Given rnode event
+ * Returns - none
+ *
+ * Posts given rnode event to all FCOE rnodes connected with given Lnode.
+ * This routine is invoked when lnode receives LINK_DOWN/DOWN_LINK/CLOSE
+ * event.
+ *
+ * This called with hw lock held
+ */
+static void
+csio_post_event_rns(struct csio_lnode *ln, enum csio_rn_ev evt)
+{
+	struct csio_rnode *rnhead = (struct csio_rnode *) &ln->rnhead;
+	struct list_head *tmp, *next;
+	struct csio_rnode *rn;
+
+	list_for_each_safe(tmp, next, &rnhead->sm.sm_list) {
+		rn = (struct csio_rnode *) tmp;
+		csio_post_event(&rn->sm, evt);
+	}
+}
+
+/*
+ * csio_cleanup_rns
+ * @ln - FCOE lnode
+ * Returns - none
+ *
+ * Frees all FCOE rnodes connected with given Lnode.
+ *
+ * This called with hw lock held
+ */
+static void
+csio_cleanup_rns(struct csio_lnode *ln)
+{
+	struct csio_rnode *rnhead = (struct csio_rnode *) &ln->rnhead;
+	struct list_head *tmp, *next_rn;
+	struct csio_rnode *rn;
+
+	list_for_each_safe(tmp, next_rn, &rnhead->sm.sm_list) {
+		rn = (struct csio_rnode *) tmp;
+		csio_put_rnode(ln, rn);
+	}
+
+}
+
+/*
+ * csio_post_event_lns
+ * @ln - FCOE lnode
+ * @evt - Given lnode event
+ * Returns - none
+ *
+ * Posts given lnode event to all FCOE lnodes connected with given Lnode.
+ * This routine is invoked when lnode receives LINK_DOWN/DOWN_LINK/CLOSE
+ * event.
+ *
+ * This called with hw lock held
+ */
+static void
+csio_post_event_lns(struct csio_lnode *ln, enum csio_ln_ev evt)
+{
+	struct list_head *tmp;
+	struct csio_lnode *cln, *sln;
+
+	/* If NPIV lnode, send evt only to that and return */
+	if (csio_is_npiv_ln(ln)) {
+		csio_post_event(&ln->sm, evt);
+		return;
+	}
+
+	sln = ln;
+	/* Traverse children lnodes list and send evt */
+	list_for_each(tmp, &sln->cln_head) {
+		cln = (struct csio_lnode *) tmp;
+		csio_post_event(&cln->sm, evt);
+	}
+
+	/* Send evt to parent lnode */
+	csio_post_event(&ln->sm, evt);
+}
+
+/*
+ * csio_ln_down - Lcoal nport is down
+ * @ln - FCOE Lnode
+ * Returns - none
+ *
+ * Sends LINK_DOWN events to Lnode and its associated NPIVs lnodes.
+ *
+ * This called with hw lock held
+ */
+static void
+csio_ln_down(struct csio_lnode *ln)
+{
+	csio_post_event_lns(ln, CSIO_LNE_LINK_DOWN);
+}
+
+/*
+ * csio_handle_link_down - Logical Linkdown event.
+ * @hw - HW module.
+ * @portid - Physical port number
+ * @fcfi - FCF index.
+ * @vnpi - VNP index.
+ * Returns - none
+ *
+ * This event is received from FW, when virtual link goes down between
+ * Physical port[ENode] and FCF. Lnode and its associated NPIVs lnode hosted on
+ * this vnpi[VN-Port] will be de-instantiated.
+ *
+ * This called with hw lock held
+ */
+static void
+csio_handle_link_down(struct csio_hw *hw, uint8_t portid, uint32_t fcfi,
+		      uint32_t vnpi)
+{
+	struct csio_fcf_info *fp;
+	struct csio_lnode *ln;
+
+	/* Lookup lnode based on vnpi */
+	ln = csio_ln_lookup_by_vnpi(hw, vnpi);
+	if (ln) {
+		fp = ln->fcfinfo;
+		CSIO_INC_STATS(ln, n_link_down);
+
+		/*Warn if linkdown received if lnode is not in ready state */
+		if (!csio_is_lnode_ready(ln)) {
+			csio_ln_warn(ln,
+				"warn: FCOE link is already in offline "
+				"Ignoring Fcoe linkdown event on portid %d\n",
+				 portid);
+			CSIO_INC_STATS(ln, n_evt_drop);
+			return;
+		}
+
+		/* Verify portid */
+		if (fp->portid != portid) {
+			csio_ln_warn(ln,
+				"warn: FCOE linkdown recv with "
+				"invalid port %d\n", portid);
+			CSIO_INC_STATS(ln, n_evt_drop);
+			return;
+		}
+
+		/* verify fcfi */
+		if (ln->fcf_flowid != fcfi) {
+			csio_ln_warn(ln,
+				"warn: FCOE linkdown recv with "
+				"invalid fcfi x%x\n", fcfi);
+			CSIO_INC_STATS(ln, n_evt_drop);
+			return;
+		}
+
+		csio_info(hw, "Port:%d - FCOE LINK DOWN\n", portid);
+
+		/* Send LINK_DOWN event to lnode s/m */
+		csio_ln_down(ln);
+
+		return;
+	} else {
+		csio_warn(hw,
+			  "warn: FCOE linkdown recv with invalid vnpi x%x\n",
+			  vnpi);
+		CSIO_INC_STATS(hw, n_evt_drop);
+	}
+}
+
+/*
+ * csio_is_lnode_ready - Checks FCOE lnode is in ready state.
+ * @ln: Lnode module
+ *
+ * Returns True if FCOE lnode is in ready state.
+ */
+int
+csio_is_lnode_ready(struct csio_lnode *ln)
+{
+	return (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_ready));
+}
+
+/*****************************************************************************/
+/* START: Lnode SM                                                           */
+/*****************************************************************************/
+/*
+ * csio_lns_uninit - The request in uninit state.
+ * @ln - FCOE lnode.
+ * @evt - Event to be processed.
+ *
+ * Process the given lnode event which is currently in "uninit" state.
+ * Invoked with HW lock held.
+ * Return - none.
+ */
+static void
+csio_lns_uninit(struct csio_lnode *ln, enum csio_ln_ev evt)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+	struct csio_lnode *rln = hw->rln;
+	int rv;
+
+	CSIO_INC_STATS(ln, n_evt_sm[evt]);
+	switch (evt) {
+	case CSIO_LNE_LINKUP:
+		csio_set_state(&ln->sm, csio_lns_online);
+		/* Read FCF only for physical lnode */
+		if (csio_is_phys_ln(ln)) {
+			rv = csio_ln_read_fcf_entry(ln,
+					csio_ln_read_fcf_cbfn);
+			if (rv != 0) {
+				/* TODO: Send HW RESET event */
+				CSIO_INC_STATS(ln, n_err);
+				break;
+			}
+
+			/* Add FCF record */
+			list_add_tail(&ln->fcfinfo->list, &rln->fcf_lsthead);
+		}
+
+		rv = csio_ln_vnp_read(ln, csio_ln_vnp_read_cbfn);
+		if (rv != 0) {
+			/* TODO: Send HW RESET event */
+			CSIO_INC_STATS(ln, n_err);
+		}
+		break;
+
+	case CSIO_LNE_DOWN_LINK:
+		break;
+
+	default:
+		csio_ln_dbg(ln,
+			    "unexp ln event %d recv from did:x%x in "
+			    "ln state[uninit].\n", evt, ln->nport_id);
+		CSIO_INC_STATS(ln, n_evt_unexp);
+		break;
+	} /* switch event */
+}
+
+/*
+ * csio_lns_online - The request in online state.
+ * @ln - FCOE lnode.
+ * @evt - Event to be processed.
+ *
+ * Process the given lnode event which is currently in "online" state.
+ * Invoked with HW lock held.
+ * Return - none.
+ */
+static void
+csio_lns_online(struct csio_lnode *ln, enum csio_ln_ev evt)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+
+	CSIO_INC_STATS(ln, n_evt_sm[evt]);
+	switch (evt) {
+	case CSIO_LNE_LINKUP:
+		csio_ln_warn(ln,
+			     "warn: FCOE link is up already "
+			     "Ignoring linkup on port:%d\n", ln->portid);
+		CSIO_INC_STATS(ln, n_evt_drop);
+		break;
+
+	case CSIO_LNE_FAB_INIT_DONE:
+		csio_set_state(&ln->sm, csio_lns_ready);
+
+		spin_unlock_irq(&hw->lock);
+		csio_lnode_async_event(ln, CSIO_LN_FC_LINKUP);
+		spin_lock_irq(&hw->lock);
+
+		break;
+
+	case CSIO_LNE_LINK_DOWN:
+		/* Fall through */
+	case CSIO_LNE_DOWN_LINK:
+		csio_set_state(&ln->sm, csio_lns_uninit);
+		if (csio_is_phys_ln(ln)) {
+			/* Remove FCF entry */
+			list_del_init(&ln->fcfinfo->list);
+		}
+		break;
+
+	default:
+		csio_ln_dbg(ln,
+			    "unexp ln event %d recv from did:x%x in "
+			    "ln state[uninit].\n", evt, ln->nport_id);
+		CSIO_INC_STATS(ln, n_evt_unexp);
+
+		break;
+	} /* switch event */
+}
+
+/*
+ * csio_lns_ready - The request in ready state.
+ * @ln - FCOE lnode.
+ * @evt - Event to be processed.
+ *
+ * Process the given lnode event which is currently in "ready" state.
+ * Invoked with HW lock held.
+ * Return - none.
+ */
+static void
+csio_lns_ready(struct csio_lnode *ln, enum csio_ln_ev evt)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+
+	CSIO_INC_STATS(ln, n_evt_sm[evt]);
+	switch (evt) {
+	case CSIO_LNE_FAB_INIT_DONE:
+		csio_ln_err(ln,
+			    "ignoring event %d recv from did x%x"
+			    "in ln state[ready].\n", evt, ln->nport_id);
+		CSIO_INC_STATS(ln, n_evt_drop);
+		break;
+
+	case CSIO_LNE_LINK_DOWN:
+		csio_set_state(&ln->sm, csio_lns_offline);
+		csio_post_event_rns(ln, CSIO_RNFE_DOWN);
+
+		spin_unlock_irq(&hw->lock);
+		csio_lnode_async_event(ln, CSIO_LN_FC_LINKDOWN);
+		spin_lock_irq(&hw->lock);
+
+		if (csio_is_phys_ln(ln)) {
+			/* Remove FCF entry */
+			list_del_init(&ln->fcfinfo->list);
+		}
+		break;
+
+	case CSIO_LNE_DOWN_LINK:
+		csio_set_state(&ln->sm, csio_lns_offline);
+		csio_post_event_rns(ln, CSIO_RNFE_DOWN);
+
+		/* Host need to issue aborts in case if FW has not returned
+		 * WRs with status "ABORTED"
+		 */
+		spin_unlock_irq(&hw->lock);
+		csio_lnode_async_event(ln, CSIO_LN_FC_LINKDOWN);
+		spin_lock_irq(&hw->lock);
+
+		if (csio_is_phys_ln(ln)) {
+			/* Remove FCF entry */
+			list_del_init(&ln->fcfinfo->list);
+		}
+		break;
+
+	case CSIO_LNE_CLOSE:
+		csio_set_state(&ln->sm, csio_lns_uninit);
+		csio_post_event_rns(ln, CSIO_RNFE_CLOSE);
+		break;
+
+	case CSIO_LNE_LOGO:
+		csio_set_state(&ln->sm, csio_lns_offline);
+		csio_post_event_rns(ln, CSIO_RNFE_DOWN);
+		break;
+
+	default:
+		csio_ln_dbg(ln,
+			    "unexp ln event %d recv from did:x%x in "
+			    "ln state[uninit].\n", evt, ln->nport_id);
+		CSIO_INC_STATS(ln, n_evt_unexp);
+		CSIO_DB_ASSERT(0);
+		break;
+	} /* switch event */
+}
+
+/*
+ * csio_lns_offline - The request in offline state.
+ * @ln - FCOE lnode.
+ * @evt - Event to be processed.
+ *
+ * Process the given lnode event which is currently in "offline" state.
+ * Invoked with HW lock held.
+ * Return - none.
+ */
+static void
+csio_lns_offline(struct csio_lnode *ln, enum csio_ln_ev evt)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+	struct csio_lnode *rln = hw->rln;
+	int rv;
+
+	CSIO_INC_STATS(ln, n_evt_sm[evt]);
+	switch (evt) {
+	case CSIO_LNE_LINKUP:
+		csio_set_state(&ln->sm, csio_lns_online);
+		/* Read FCF only for physical lnode */
+		if (csio_is_phys_ln(ln)) {
+			rv = csio_ln_read_fcf_entry(ln,
+					csio_ln_read_fcf_cbfn);
+			if (rv != 0) {
+				/* TODO: Send HW RESET event */
+				CSIO_INC_STATS(ln, n_err);
+				break;
+			}
+
+			/* Add FCF record */
+			list_add_tail(&ln->fcfinfo->list, &rln->fcf_lsthead);
+		}
+
+		rv = csio_ln_vnp_read(ln, csio_ln_vnp_read_cbfn);
+		if (rv != 0) {
+			/* TODO: Send HW RESET event */
+			CSIO_INC_STATS(ln, n_err);
+		}
+		break;
+
+	case CSIO_LNE_LINK_DOWN:
+	case CSIO_LNE_DOWN_LINK:
+	case CSIO_LNE_LOGO:
+		csio_ln_err(ln,
+			    "ignoring event %d recv from did x%x"
+			    "in ln state[offline].\n", evt, ln->nport_id);
+		CSIO_INC_STATS(ln, n_evt_drop);
+		break;
+
+	case CSIO_LNE_CLOSE:
+		csio_set_state(&ln->sm, csio_lns_uninit);
+		csio_post_event_rns(ln, CSIO_RNFE_CLOSE);
+		break;
+
+	default:
+		csio_ln_dbg(ln,
+			    "unexp ln event %d recv from did:x%x in "
+			    "ln state[offline]\n", evt, ln->nport_id);
+		CSIO_INC_STATS(ln, n_evt_unexp);
+		CSIO_DB_ASSERT(0);
+		break;
+	} /* switch event */
+}
+
+/*****************************************************************************/
+/* END: Lnode SM                                                             */
+/*****************************************************************************/
+
+static void
+csio_free_fcfinfo(struct kref *kref)
+{
+	struct csio_fcf_info *fcfinfo = container_of(kref,
+						struct csio_fcf_info, kref);
+	kfree(fcfinfo);
+}
+
+/* Helper routines for attributes  */
+/*
+ * csio_lnode_state_to_str - Get current state of FCOE lnode.
+ * @ln - lnode
+ * @str - state of lnode.
+ *
+ */
+void
+csio_lnode_state_to_str(struct csio_lnode *ln, int8_t *str)
+{
+	if (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_uninit)) {
+		strcpy(str, "UNINIT");
+		return;
+	}
+	if (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_ready)) {
+		strcpy(str, "READY");
+		return;
+	}
+	if (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_offline)) {
+		strcpy(str, "OFFLINE");
+		return;
+	}
+	strcpy(str, "UNKNOWN");
+} /* csio_lnode_state_to_str */
+
+
+int
+csio_get_phy_port_stats(struct csio_hw *hw, uint8_t portid,
+			struct fw_fcoe_port_stats *port_stats)
+{
+	struct csio_mb  *mbp;
+	struct fw_fcoe_port_cmd_params portparams;
+	enum fw_retval retval;
+	int idx;
+
+	mbp = mempool_alloc(hw->mb_mempool, GFP_ATOMIC);
+	if (!mbp) {
+		csio_err(hw, "FCoE FCF PARAMS command out of memory!\n");
+		return -EINVAL;
+	}
+	portparams.portid = portid;
+
+	for (idx = 1; idx <= 3; idx++) {
+		portparams.idx = (idx-1)*6 + 1;
+		portparams.nstats = 6;
+		if (idx == 3)
+			portparams.nstats = 4;
+		csio_fcoe_read_portparams_init_mb(hw, mbp, CSIO_MB_DEFAULT_TMO,
+							&portparams, NULL);
+		if (csio_mb_issue(hw, mbp)) {
+			csio_err(hw, "Issue of FCoE port params failed!\n");
+			mempool_free(mbp, hw->mb_mempool);
+			return -EINVAL;
+		}
+		csio_mb_process_portparams_rsp(hw, mbp, &retval,
+						&portparams, port_stats);
+	}
+
+	mempool_free(mbp, hw->mb_mempool);
+	return 0;
+}
+
+/*
+ * csio_ln_mgmt_wr_handler -Mgmt Work Request handler.
+ * @wr - WR.
+ * @len - WR len.
+ * This handler is invoked when an outstanding mgmt WR is completed.
+ * Its invoked in the context of FW event worker thread for every
+ * mgmt event received.
+ * Return - none.
+ */
+
+static void
+csio_ln_mgmt_wr_handler(struct csio_hw *hw, void *wr, uint32_t len)
+{
+	struct csio_mgmtm *mgmtm = csio_hw_to_mgmtm(hw);
+	struct csio_ioreq *io_req = NULL;
+	struct fw_fcoe_els_ct_wr *wr_cmd;
+
+
+	wr_cmd = (struct fw_fcoe_els_ct_wr *) wr;
+
+	if (len < sizeof(struct fw_fcoe_els_ct_wr)) {
+		csio_err(mgmtm->hw,
+			 "Invalid ELS CT WR length recvd, len:%x\n", len);
+		mgmtm->stats.n_err++;
+		return;
+	}
+
+	io_req = (struct csio_ioreq *) ((uintptr_t) wr_cmd->cookie);
+	io_req->wr_status = csio_wr_status(wr_cmd);
+
+	/* lookup ioreq exists in our active Q */
+	spin_lock_irq(&hw->lock);
+	if (csio_mgmt_req_lookup(mgmtm, io_req) != 0) {
+		csio_err(mgmtm->hw,
+			"Error- Invalid IO handle recv in WR. handle: %p\n",
+			io_req);
+		mgmtm->stats.n_err++;
+		spin_unlock_irq(&hw->lock);
+		return;
+	}
+
+	mgmtm = csio_hw_to_mgmtm(hw);
+
+	/* Dequeue from active queue */
+	list_del_init(&io_req->sm.sm_list);
+	mgmtm->stats.n_active--;
+	spin_unlock_irq(&hw->lock);
+
+	/* io_req will be freed by completion handler */
+	if (io_req->io_cbfn)
+		io_req->io_cbfn(hw, io_req);
+}
+
+/**
+ * csio_fcoe_fwevt_handler - Event handler for Firmware FCoE events.
+ * @hw:		HW module
+ * @cpl_op:	CPL opcode
+ * @cmd:	FW cmd/WR.
+ *
+ * Process received FCoE cmd/WR event from FW.
+ */
+void
+csio_fcoe_fwevt_handler(struct csio_hw *hw, __u8 cpl_op, __be64 *cmd)
+{
+	struct csio_lnode *ln;
+	struct csio_rnode *rn;
+	uint8_t portid, opcode = *(uint8_t *)cmd;
+	struct fw_fcoe_link_cmd *lcmd;
+	struct fw_wr_hdr *wr;
+	struct fw_rdev_wr *rdev_wr;
+	enum fw_fcoe_link_status lstatus;
+	uint32_t fcfi, rdev_flowid, vnpi;
+	enum csio_ln_ev evt;
+
+	if (cpl_op == CPL_FW6_MSG && opcode == FW_FCOE_LINK_CMD) {
+
+		lcmd = (struct fw_fcoe_link_cmd *)cmd;
+		lstatus = lcmd->lstatus;
+		portid = FW_FCOE_LINK_CMD_PORTID_GET(
+					ntohl(lcmd->op_to_portid));
+		fcfi = FW_FCOE_LINK_CMD_FCFI_GET(ntohl(lcmd->sub_opcode_fcfi));
+		vnpi = FW_FCOE_LINK_CMD_VNPI_GET(ntohl(lcmd->vnpi_pkd));
+
+		if (lstatus == FCOE_LINKUP) {
+
+			/* HW lock here */
+			spin_lock_irq(&hw->lock);
+			csio_handle_link_up(hw, portid, fcfi, vnpi);
+			spin_unlock_irq(&hw->lock);
+			/* HW un lock here */
+
+		} else if (lstatus == FCOE_LINKDOWN) {
+
+			/* HW lock here */
+			spin_lock_irq(&hw->lock);
+			csio_handle_link_down(hw, portid, fcfi, vnpi);
+			spin_unlock_irq(&hw->lock);
+			/* HW un lock here */
+		} else {
+			csio_warn(hw, "Unexpected FCOE LINK status:0x%x\n",
+				    ntohl(lcmd->lstatus));
+			CSIO_INC_STATS(hw, n_cpl_unexp);
+		}
+	} else if (cpl_op == CPL_FW6_PLD) {
+		wr = (struct fw_wr_hdr *) (cmd + 4);
+		if (FW_WR_OP_GET(be32_to_cpu(wr->hi))
+			== FW_RDEV_WR) {
+
+			rdev_wr = (struct fw_rdev_wr *) (cmd + 4);
+
+			rdev_flowid = FW_RDEV_WR_FLOWID_GET(
+					ntohl(rdev_wr->alloc_to_len16));
+			vnpi = FW_RDEV_WR_ASSOC_FLOWID_GET(
+				    ntohl(rdev_wr->flags_to_assoc_flowid));
+
+			csio_dbg(hw,
+				"FW_RDEV_WR: flowid:x%x ev_cause:x%x "
+				"vnpi:0x%x\n", rdev_flowid,
+				rdev_wr->event_cause, vnpi);
+
+			if (rdev_wr->protocol != PROT_FCOE) {
+				csio_err(hw,
+					"FW_RDEV_WR: invalid proto:x%x "
+					"received with flowid:x%x\n",
+					rdev_wr->protocol,
+					rdev_flowid);
+				CSIO_INC_STATS(hw, n_evt_drop);
+				return;
+			}
+
+			/* HW lock here */
+			spin_lock_irq(&hw->lock);
+			ln = csio_ln_lookup_by_vnpi(hw, vnpi);
+			if (!ln) {
+				csio_err(hw,
+					"FW_DEV_WR: invalid vnpi:x%x received "
+					"with flowid:x%x\n", vnpi, rdev_flowid);
+				CSIO_INC_STATS(hw, n_evt_drop);
+				spin_unlock_irq(&hw->lock);
+				return;
+			}
+
+			rn = csio_confirm_rnode(ln, rdev_flowid,
+					&rdev_wr->u.fcoe_rdev);
+			if (!rn) {
+				csio_ln_dbg(ln,
+					"Failed to confirm rnode "
+					"for flowid:x%x\n", rdev_flowid);
+				CSIO_INC_STATS(hw, n_evt_drop);
+				spin_unlock_irq(&hw->lock);
+				return;
+			}
+
+			/* save previous event for debugging */
+			ln->prev_evt = ln->cur_evt;
+			ln->cur_evt = rdev_wr->event_cause;
+			CSIO_INC_STATS(ln, n_evt_fw[rdev_wr->event_cause]);
+
+			/* Translate all the fabric events to lnode SM events */
+			evt = CSIO_FWE_TO_LNE(rdev_wr->event_cause);
+			if (evt) {
+				csio_ln_dbg(ln,
+					"Posting event to lnode event:%d "
+					"cause:%d flowid:x%x\n", evt,
+					rdev_wr->event_cause, rdev_flowid);
+				csio_post_event(&ln->sm, evt);
+			}
+
+			/* Handover event to rn SM here. */
+			csio_rnode_fwevt_handler(rn, rdev_wr->event_cause);
+
+			spin_unlock_irq(&hw->lock);
+		} else {
+			csio_warn(hw, "unexpected WR op(0x%x) recv\n",
+				FW_WR_OP_GET(be32_to_cpu((wr->hi))));
+			CSIO_INC_STATS(hw, n_cpl_unexp);
+		}
+	} else if (cpl_op == CPL_FW6_MSG) {
+		wr = (struct fw_wr_hdr *) (cmd);
+		if (FW_WR_OP_GET(be32_to_cpu(wr->hi)) == FW_FCOE_ELS_CT_WR) {
+			csio_ln_mgmt_wr_handler(hw, wr,
+					sizeof(struct fw_fcoe_els_ct_wr));
+		} else {
+			csio_warn(hw, "unexpected WR op(0x%x) recv\n",
+				FW_WR_OP_GET(be32_to_cpu((wr->hi))));
+			CSIO_INC_STATS(hw, n_cpl_unexp);
+		}
+	} else {
+		csio_warn(hw, "unexpected CPL op(0x%x) recv\n", opcode);
+		CSIO_INC_STATS(hw, n_cpl_unexp);
+	}
+}
+
+/**
+ * csio_lnode_start - Kickstart lnode discovery.
+ * @ln:		lnode
+ *
+ * This routine kickstarts the discovery by issuing an FCOE_LINK (up) command.
+ */
+int
+csio_lnode_start(struct csio_lnode *ln)
+{
+	int rv = 0;
+	if (csio_is_phys_ln(ln) && !(ln->flags & CSIO_LNF_LINK_ENABLE)) {
+		rv = csio_fcoe_enable_link(ln, 1);
+		ln->flags |= CSIO_LNF_LINK_ENABLE;
+	}
+
+	return rv;
+}
+
+/**
+ * csio_lnode_stop - Stop the lnode.
+ * @ln:		lnode
+ *
+ * This routine is invoked by HW module to stop lnode and its associated NPIV
+ * lnodes.
+ */
+void
+csio_lnode_stop(struct csio_lnode *ln)
+{
+	csio_post_event_lns(ln, CSIO_LNE_DOWN_LINK);
+	if (csio_is_phys_ln(ln) && (ln->flags & CSIO_LNF_LINK_ENABLE)) {
+		csio_fcoe_enable_link(ln, 0);
+		ln->flags &= ~CSIO_LNF_LINK_ENABLE;
+	}
+	csio_ln_dbg(ln, "stopping ln :%p\n", ln);
+}
+
+/**
+ * csio_lnode_close - Close an lnode.
+ * @ln:		lnode
+ *
+ * This routine is invoked by HW module to close an lnode and its
+ * associated NPIV lnodes. Lnode and its associated NPIV lnodes are
+ * set to uninitialized state.
+ */
+void
+csio_lnode_close(struct csio_lnode *ln)
+{
+	csio_post_event_lns(ln, CSIO_LNE_CLOSE);
+	if (csio_is_phys_ln(ln))
+		ln->vnp_flowid = CSIO_INVALID_IDX;
+
+	csio_ln_dbg(ln, "closed ln :%p\n", ln);
+}
+
+/*
+ * csio_ln_prep_ecwr - Prepare ELS/CT WR.
+ * @io_req - IO request.
+ * @wr_len - WR len
+ * @immd_len - WR immediate data
+ * @sub_op - Sub opcode
+ * @sid - source portid.
+ * @did - destination portid
+ * @flow_id - flowid
+ * @fw_wr - ELS/CT WR to be prepared.
+ * Returns: 0 - on success
+ */
+static int
+csio_ln_prep_ecwr(struct csio_ioreq *io_req, uint32_t wr_len,
+		      uint32_t immd_len, uint8_t sub_op, uint32_t sid,
+		      uint32_t did, uint32_t flow_id, uint8_t *fw_wr)
+{
+	struct fw_fcoe_els_ct_wr *wr;
+	uint32_t port_id;
+
+	wr  = (struct fw_fcoe_els_ct_wr *)fw_wr;
+	wr->op_immdlen = cpu_to_be32(FW_WR_OP(FW_FCOE_ELS_CT_WR) |
+				     FW_FCOE_ELS_CT_WR_IMMDLEN(immd_len));
+
+	wr_len =  DIV_ROUND_UP(wr_len, 16);
+	wr->flowid_len16 = cpu_to_be32(FW_WR_FLOWID(flow_id) |
+					  FW_WR_LEN16(wr_len));
+	wr->els_ct_type = sub_op;
+	wr->ctl_pri = 0;
+	wr->cp_en_class = 0;
+	wr->cookie = io_req->fw_handle;
+	wr->iqid = (uint16_t)cpu_to_be16(csio_q_physiqid(
+			io_req->lnode->hwp, io_req->iq_idx));
+	wr->fl_to_sp =  FW_FCOE_ELS_CT_WR_SP(1);
+	wr->tmo_val = (uint8_t) io_req->tmo;
+	port_id = htonl(sid);
+	memcpy(wr->l_id, PORT_ID_PTR(port_id), 3);
+	port_id = htonl(did);
+	memcpy(wr->r_id, PORT_ID_PTR(port_id), 3);
+
+	/* Prepare RSP SGL */
+	wr->rsp_dmalen = cpu_to_be32(io_req->dma_buf.len);
+	wr->rsp_dmaaddr = cpu_to_be64(io_req->dma_buf.paddr);
+	return 0;
+}
+
+/*
+ * csio_ln_mgmt_submit_wr - Post elsct work request.
+ * @mgmtm - mgmtm
+ * @io_req - io request.
+ * @sub_op - ELS or CT request type
+ * @pld - Dma Payload buffer
+ * @pld_len - Payload len
+ * Prepares ELSCT Work request and sents it to FW.
+ * Returns: 0 - on success
+ */
+static int
+csio_ln_mgmt_submit_wr(struct csio_mgmtm *mgmtm, struct csio_ioreq *io_req,
+		uint8_t sub_op, struct csio_dma_buf *pld,
+		uint32_t pld_len)
+{
+	struct csio_wr_pair wrp;
+	struct csio_lnode *ln = io_req->lnode;
+	struct csio_rnode *rn = io_req->rnode;
+	struct	csio_hw	*hw = mgmtm->hw;
+	uint8_t fw_wr[64];
+	struct ulptx_sgl dsgl;
+	uint32_t wr_size = 0;
+	uint8_t im_len = 0;
+	uint32_t wr_off = 0;
+
+	int ret = 0;
+
+	/* Calculate WR Size for this ELS REQ */
+	wr_size = sizeof(struct fw_fcoe_els_ct_wr);
+
+	/* Send as immediate data if pld < 256 */
+	if (pld_len < 256) {
+		wr_size += ALIGN(pld_len, 8);
+		im_len = (uint8_t)pld_len;
+	} else
+		wr_size += sizeof(struct ulptx_sgl);
+
+	/* Roundup WR size in units of 16 bytes */
+	wr_size = ALIGN(wr_size, 16);
+
+	/* Get WR to send ELS REQ */
+	ret = csio_wr_get(hw, mgmtm->eq_idx, wr_size, &wrp);
+	if (ret != 0) {
+		csio_err(hw, "Failed to get WR for ec_req %p ret:%d\n",
+			io_req, ret);
+		return ret;
+	}
+
+	/* Prepare Generic WR used by all ELS/CT cmd */
+	csio_ln_prep_ecwr(io_req, wr_size, im_len, sub_op,
+				ln->nport_id, rn->nport_id,
+				csio_rn_flowid(rn),
+				&fw_wr[0]);
+
+	/* Copy ELS/CT WR CMD */
+	csio_wr_copy_to_wrp(&fw_wr[0], &wrp, wr_off,
+			sizeof(struct fw_fcoe_els_ct_wr));
+	wr_off += sizeof(struct fw_fcoe_els_ct_wr);
+
+	/* Copy payload to Immediate section of WR */
+	if (im_len)
+		csio_wr_copy_to_wrp(pld->vaddr, &wrp, wr_off, im_len);
+	else {
+		/* Program DSGL to dma payload */
+		dsgl.cmd_nsge = htonl(ULPTX_CMD(ULP_TX_SC_DSGL) |
+					ULPTX_MORE | ULPTX_NSGE(1));
+		dsgl.len0 = cpu_to_be32(pld_len);
+		dsgl.addr0 = cpu_to_be64(pld->paddr);
+		csio_wr_copy_to_wrp(&dsgl, &wrp, ALIGN(wr_off, 8),
+				   sizeof(struct ulptx_sgl));
+	}
+
+	/* Issue work request to xmit ELS/CT req to FW */
+	csio_wr_issue(mgmtm->hw, mgmtm->eq_idx, false);
+	return ret;
+}
+
+/*
+ * csio_ln_mgmt_submit_req - Submit FCOE Mgmt request.
+ * @io_req - IO Request
+ * @io_cbfn - Completion handler.
+ * @req_type - ELS or CT request type
+ * @pld - Dma Payload buffer
+ * @pld_len - Payload len
+ *
+ *
+ * This API used submit managment ELS/CT request.
+ * This called with hw lock held
+ * Returns: 0 - on success
+ *	    -ENOMEM	- on error.
+ */
+static int
+csio_ln_mgmt_submit_req(struct csio_ioreq *io_req,
+		void (*io_cbfn) (struct csio_hw *, struct csio_ioreq *),
+		enum fcoe_cmn_type req_type, struct csio_dma_buf *pld,
+		uint32_t pld_len)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(io_req->lnode);
+	struct csio_mgmtm *mgmtm = csio_hw_to_mgmtm(hw);
+	int rv;
+
+	io_req->io_cbfn = io_cbfn;	/* Upper layer callback handler */
+	io_req->fw_handle = (uintptr_t) (io_req);
+	io_req->eq_idx = mgmtm->eq_idx;
+	io_req->iq_idx = mgmtm->iq_idx;
+
+	rv = csio_ln_mgmt_submit_wr(mgmtm, io_req, req_type, pld, pld_len);
+	if (rv == 0) {
+		list_add_tail(&io_req->sm.sm_list, &mgmtm->active_q);
+		mgmtm->stats.n_active++;
+	}
+	return rv;
+}
+
+/*
+ * csio_ln_fdmi_init - FDMI Init entry point.
+ * @ln: lnode
+ */
+static int
+csio_ln_fdmi_init(struct csio_lnode *ln)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+	struct csio_dma_buf	*dma_buf;
+
+	/* Allocate MGMT request required for FDMI */
+	ln->mgmt_req = kzalloc(sizeof(struct csio_ioreq), GFP_KERNEL);
+	if (!ln->mgmt_req) {
+		csio_ln_err(ln, "Failed to alloc ioreq for FDMI\n");
+		CSIO_INC_STATS(hw, n_err_nomem);
+		return -ENOMEM;
+	}
+
+	/* Allocate Dma buffers for FDMI response Payload */
+	dma_buf = &ln->mgmt_req->dma_buf;
+	dma_buf->len = 2048;
+	dma_buf->vaddr = pci_alloc_consistent(hw->pdev, dma_buf->len,
+						&dma_buf->paddr);
+	if (!dma_buf->vaddr) {
+		csio_err(hw, "Failed to alloc DMA buffer for FDMI!\n");
+		kfree(ln->mgmt_req);
+		ln->mgmt_req = NULL;
+		return -ENOMEM;
+	}
+
+	ln->flags |= CSIO_LNF_FDMI_ENABLE;
+	return 0;
+}
+
+/*
+ * csio_ln_fdmi_exit - FDMI exit entry point.
+ * @ln: lnode
+ */
+static int
+csio_ln_fdmi_exit(struct csio_lnode *ln)
+{
+	struct csio_dma_buf *dma_buf;
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+
+	if (!ln->mgmt_req)
+		return 0;
+
+	dma_buf = &ln->mgmt_req->dma_buf;
+	if (dma_buf->vaddr)
+		pci_free_consistent(hw->pdev, dma_buf->len, dma_buf->vaddr,
+				    dma_buf->paddr);
+
+	kfree(ln->mgmt_req);
+	return 0;
+}
+
+int
+csio_scan_done(struct csio_lnode *ln, unsigned long ticks,
+		unsigned long time, unsigned long max_scan_ticks,
+		unsigned long delta_scan_ticks)
+{
+	int rv = 0;
+
+	if (time >= max_scan_ticks)
+		return 1;
+
+	if (!ln->tgt_scan_tick)
+		ln->tgt_scan_tick = ticks;
+
+	if (((ticks - ln->tgt_scan_tick) >= delta_scan_ticks)) {
+		if (!ln->last_scan_ntgts)
+			ln->last_scan_ntgts = ln->n_scsi_tgts;
+		else {
+			if (ln->last_scan_ntgts == ln->n_scsi_tgts)
+				return 1;
+
+			ln->last_scan_ntgts = ln->n_scsi_tgts;
+		}
+		ln->tgt_scan_tick = ticks;
+	}
+	return rv;
+}
+
+/*
+ * csio_notify_lnodes:
+ * @hw: HW module
+ * @note: Notification
+ *
+ * Called from the HW SM to fan out notifications to the
+ * Lnode SM. Since the HW SM is entered with lock held,
+ * there is no need to hold locks here.
+ *
+ */
+void
+csio_notify_lnodes(struct csio_hw *hw, enum csio_ln_notify note)
+{
+	struct list_head *tmp;
+	struct csio_lnode *ln;
+
+	csio_dbg(hw, "Notifying all nodes of event %d\n", note);
+
+	/* Traverse children lnodes list and send evt */
+	list_for_each(tmp, &hw->sln_head) {
+		ln = (struct csio_lnode *) tmp;
+
+		switch (note) {
+		case CSIO_LN_NOTIFY_HWREADY:
+			csio_lnode_start(ln);
+			break;
+
+		case CSIO_LN_NOTIFY_HWRESET:
+		case CSIO_LN_NOTIFY_HWREMOVE:
+			csio_lnode_close(ln);
+			break;
+
+		case CSIO_LN_NOTIFY_HWSTOP:
+			csio_lnode_stop(ln);
+			break;
+
+		default:
+			break;
+
+		}
+	}
+}
+
+/*
+ * csio_disable_lnodes:
+ * @hw: HW module
+ * @portid:port id
+ * @disable: disable/enable flag.
+ * If disable=1, disables all lnode hosted on given physical port.
+ * otherwise enables all the lnodes on given phsysical port.
+ * This routine need to called with hw lock held.
+ */
+void
+csio_disable_lnodes(struct csio_hw *hw, uint8_t portid, bool disable)
+{
+	struct list_head *tmp;
+	struct csio_lnode *ln;
+
+	csio_dbg(hw, "Notifying event to all nodes of port:%d\n", portid);
+
+	/* Traverse sibling lnodes list and send evt */
+	list_for_each(tmp, &hw->sln_head) {
+		ln = (struct csio_lnode *) tmp;
+		if (ln->portid != portid)
+			continue;
+
+		if (disable)
+			csio_lnode_stop(ln);
+		else
+			csio_lnode_start(ln);
+	}
+}
+
+/*
+ * csio_ln_init - Initialize an lnode.
+ * @ln:		lnode
+ *
+ */
+static int
+csio_ln_init(struct csio_lnode *ln)
+{
+	int rv = -EINVAL;
+	struct csio_lnode *rln, *pln;
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+
+	csio_init_state(&ln->sm, csio_lns_uninit);
+	ln->vnp_flowid = CSIO_INVALID_IDX;
+	ln->fcf_flowid = CSIO_INVALID_IDX;
+
+	if (csio_is_root_ln(ln)) {
+
+		/* This is the lnode used during initialization */
+
+		ln->fcfinfo = kzalloc(sizeof(struct csio_fcf_info), GFP_KERNEL);
+		if (!ln->fcfinfo) {
+			csio_ln_err(ln, "Failed to alloc FCF record\n");
+			CSIO_INC_STATS(hw, n_err_nomem);
+			goto err;
+		}
+
+		INIT_LIST_HEAD(&ln->fcf_lsthead);
+		kref_init(&ln->fcfinfo->kref);
+
+		if (csio_fdmi_enable && csio_ln_fdmi_init(ln))
+			goto err;
+
+	} else { /* Either a non-root physical or a virtual lnode */
+
+		/*
+		 * THe rest is common for non-root physical and NPIV lnodes.
+		 * Just get references to all other modules
+		 */
+		rln = csio_root_lnode(ln);
+
+		if (csio_is_npiv_ln(ln)) {
+			/* NPIV */
+			pln = csio_parent_lnode(ln);
+			kref_get(&pln->fcfinfo->kref);
+			ln->fcfinfo = pln->fcfinfo;
+		} else {
+			/* Another non-root physical lnode (FCF) */
+			ln->fcfinfo = kzalloc(sizeof(struct csio_fcf_info),
+								GFP_KERNEL);
+			if (!ln->fcfinfo) {
+				csio_ln_err(ln,
+					"Failed to alloc FCF info\n");
+				CSIO_INC_STATS(hw, n_err_nomem);
+				goto err;
+			}
+
+			kref_init(&ln->fcfinfo->kref);
+
+			if (csio_fdmi_enable && csio_ln_fdmi_init(ln))
+				goto err;
+		}
+
+	} /* if (!csio_is_root_ln(ln)) */
+
+	return 0;
+err:
+	return rv;
+}
+
+static void
+csio_ln_exit(struct csio_lnode *ln)
+{
+	struct csio_lnode *pln;
+
+	csio_cleanup_rns(ln);
+	if (csio_is_npiv_ln(ln)) {
+		pln = csio_parent_lnode(ln);
+		kref_put(&pln->fcfinfo->kref, csio_free_fcfinfo);
+	} else {
+		kref_put(&ln->fcfinfo->kref, csio_free_fcfinfo);
+		if (csio_fdmi_enable)
+			csio_ln_fdmi_exit(ln);
+	}
+	ln->fcfinfo = NULL;
+}
+
+/**
+ * csio_lnode_init - Initialize the members of an lnode.
+ * @ln:		lnode
+ *
+ */
+int
+csio_lnode_init(struct csio_lnode *ln, struct csio_hw *hw,
+		struct csio_lnode *pln)
+{
+	int rv = -EINVAL;
+
+	/* Link this lnode to hw */
+	csio_lnode_to_hw(ln)	= hw;
+
+	/* Link child to parent if child lnode */
+	if (pln)
+		ln->pln = pln;
+	else
+		ln->pln = NULL;
+
+	/* Initialize scsi_tgt and timers to zero */
+	ln->n_scsi_tgts = 0;
+	ln->last_scan_ntgts = 0;
+	ln->tgt_scan_tick = 0;
+
+	/* Initialize rnode list */
+	INIT_LIST_HEAD(&ln->rnhead);
+	INIT_LIST_HEAD(&ln->cln_head);
+
+	/* Initialize log level for debug */
+	ln->params.log_level	= hw->params.log_level;
+
+	if (csio_ln_init(ln))
+		goto err;
+
+	/* Add lnode to list of sibling or children lnodes */
+	spin_lock_irq(&hw->lock);
+	list_add_tail(&ln->sm.sm_list, pln ? &pln->cln_head : &hw->sln_head);
+	if (pln)
+		pln->num_vports++;
+	spin_unlock_irq(&hw->lock);
+
+	hw->num_lns++;
+
+	return 0;
+err:
+	csio_lnode_to_hw(ln) = NULL;
+	return rv;
+}
+
+/**
+ * csio_lnode_exit - De-instantiate an lnode.
+ * @ln:		lnode
+ *
+ */
+void
+csio_lnode_exit(struct csio_lnode *ln)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+
+	csio_ln_exit(ln);
+
+	/* Remove this lnode from hw->sln_head */
+	spin_lock_irq(&hw->lock);
+
+	list_del_init(&ln->sm.sm_list);
+
+	/* If it is children lnode, decrement the
+	 * counter in its parent lnode
+	 */
+	if (ln->pln)
+		ln->pln->num_vports--;
+
+	/* Update root lnode pointer */
+	if (list_empty(&hw->sln_head))
+		hw->rln = NULL;
+	else
+		hw->rln = (struct csio_lnode *)csio_list_next(&hw->sln_head);
+
+	spin_unlock_irq(&hw->lock);
+
+	csio_lnode_to_hw(ln)	= NULL;
+	hw->num_lns--;
+}
diff --git a/drivers/scsi/csiostor/csio_rnode.c b/drivers/scsi/csiostor/csio_rnode.c
new file mode 100644
index 0000000..5e224a0
--- /dev/null
+++ b/drivers/scsi/csiostor/csio_rnode.c
@@ -0,0 +1,889 @@
+/*
+ * This file is part of the Chelsio FCoE driver for Linux.
+ *
+ * Copyright (c) 2008-2012 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/string.h>
+#include <scsi/scsi_transport_fc.h>
+#include <scsi/fc/fc_els.h>
+#include <scsi/fc/fc_fs.h>
+
+#include "csio_hw.h"
+#include "csio_lnode.h"
+#include "csio_rnode.h"
+
+static int csio_rnode_init(struct csio_rnode *, struct csio_lnode *);
+static void csio_rnode_exit(struct csio_rnode *);
+
+/* Static machine forward declarations */
+static void csio_rns_uninit(struct csio_rnode *, enum csio_rn_ev);
+static void csio_rns_ready(struct csio_rnode *, enum csio_rn_ev);
+static void csio_rns_offline(struct csio_rnode *, enum csio_rn_ev);
+static void csio_rns_disappeared(struct csio_rnode *, enum csio_rn_ev);
+
+/* RNF event mapping */
+static enum csio_rn_ev fwevt_to_rnevt[] = {
+	CSIO_RNFE_NONE,		/* None */
+	CSIO_RNFE_LOGGED_IN,	/* PLOGI_ACC_RCVD  */
+	CSIO_RNFE_NONE,		/* PLOGI_RJT_RCVD  */
+	CSIO_RNFE_PLOGI_RECV,	/* PLOGI_RCVD	   */
+	CSIO_RNFE_LOGO_RECV,	/* PLOGO_RCVD	   */
+	CSIO_RNFE_PRLI_DONE,	/* PRLI_ACC_RCVD   */
+	CSIO_RNFE_NONE,		/* PRLI_RJT_RCVD   */
+	CSIO_RNFE_PRLI_RECV,	/* PRLI_RCVD	   */
+	CSIO_RNFE_PRLO_RECV,	/* PRLO_RCVD	   */
+	CSIO_RNFE_NONE,		/* NPORT_ID_CHGD   */
+	CSIO_RNFE_LOGO_RECV,	/* FLOGO_RCVD	   */
+	CSIO_RNFE_NONE,		/* CLR_VIRT_LNK_RCVD */
+	CSIO_RNFE_LOGGED_IN,	/* FLOGI_ACC_RCVD   */
+	CSIO_RNFE_NONE,		/* FLOGI_RJT_RCVD   */
+	CSIO_RNFE_LOGGED_IN,	/* FDISC_ACC_RCVD   */
+	CSIO_RNFE_NONE,		/* FDISC_RJT_RCVD   */
+	CSIO_RNFE_NONE,		/* FLOGI_TMO_MAX_RETRY */
+	CSIO_RNFE_NONE,		/* IMPL_LOGO_ADISC_ACC */
+	CSIO_RNFE_NONE,		/* IMPL_LOGO_ADISC_RJT */
+	CSIO_RNFE_NONE,		/* IMPL_LOGO_ADISC_CNFLT */
+	CSIO_RNFE_NONE,		/* PRLI_TMO		*/
+	CSIO_RNFE_NONE,		/* ADISC_TMO		*/
+	CSIO_RNFE_NAME_MISSING,	/* RSCN_DEV_LOST  */
+	CSIO_RNFE_NONE,		/* SCR_ACC_RCVD	*/
+	CSIO_RNFE_NONE,		/* ADISC_RJT_RCVD */
+	CSIO_RNFE_NONE,		/* LOGO_SNT */
+	CSIO_RNFE_LOGO_RECV,	/* PROTO_ERR_IMPL_LOGO */
+};
+
+#define CSIO_FWE_TO_RNFE(_evt)	((_evt > PROTO_ERR_IMPL_LOGO) ?		\
+						CSIO_RNFE_NONE :	\
+						fwevt_to_rnevt[_evt])
+int
+csio_is_rnode_ready(struct csio_rnode *rn)
+{
+	return csio_match_state(rn, csio_rns_ready);
+}
+
+static int
+csio_is_rnode_uninit(struct csio_rnode *rn)
+{
+	return csio_match_state(rn, csio_rns_uninit);
+}
+
+/*
+ * csio_rn_lookup - Finds the rnode with the given flowid
+ * @ln - lnode
+ * @flowid - flowid.
+ *
+ * Does the rnode lookup on the given lnode and flowid.If no matching entry
+ * found, NULL is returned.
+ */
+static struct csio_rnode *
+csio_rn_lookup(struct csio_lnode *ln, uint32_t flowid)
+{
+	struct csio_rnode *rnhead = (struct csio_rnode *) &ln->rnhead;
+	struct list_head *tmp;
+	struct csio_rnode *rn;
+
+	list_for_each(tmp, &rnhead->sm.sm_list) {
+		rn = (struct csio_rnode *) tmp;
+		if (rn->flowid == flowid)
+			return rn;
+	}
+
+	return NULL;
+}
+
+/*
+ * csio_rn_lookup_wwpn - Finds the rnode with the given wwpn
+ * @ln: lnode
+ * @wwpn: wwpn
+ *
+ * Does the rnode lookup on the given lnode and wwpn. If no matching entry
+ * found, NULL is returned.
+ */
+static struct csio_rnode *
+csio_rn_lookup_wwpn(struct csio_lnode *ln, uint8_t *wwpn)
+{
+	struct csio_rnode *rnhead = (struct csio_rnode *) &ln->rnhead;
+	struct list_head *tmp;
+	struct csio_rnode *rn;
+
+	list_for_each(tmp, &rnhead->sm.sm_list) {
+		rn = (struct csio_rnode *) tmp;
+		if (!memcmp(csio_rn_wwpn(rn), wwpn, 8))
+			return rn;
+	}
+
+	return NULL;
+}
+
+/**
+ * csio_rnode_lookup_portid - Finds the rnode with the given portid
+ * @ln:		lnode
+ * @portid:	port id
+ *
+ * Lookup the rnode list for a given portid. If no matching entry
+ * found, NULL is returned.
+ */
+struct csio_rnode *
+csio_rnode_lookup_portid(struct csio_lnode *ln, uint32_t portid)
+{
+	struct csio_rnode *rnhead = (struct csio_rnode *) &ln->rnhead;
+	struct list_head *tmp;
+	struct csio_rnode *rn;
+
+	list_for_each(tmp, &rnhead->sm.sm_list) {
+		rn = (struct csio_rnode *) tmp;
+		if (rn->nport_id == portid)
+			return rn;
+	}
+
+	return NULL;
+}
+
+static int
+csio_rn_dup_flowid(struct csio_lnode *ln, uint32_t rdev_flowid,
+		    uint32_t *vnp_flowid)
+{
+	struct csio_rnode *rnhead;
+	struct list_head *tmp, *tmp1;
+	struct csio_rnode *rn;
+	struct csio_lnode *ln_tmp;
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+
+	list_for_each(tmp1, &hw->sln_head) {
+		ln_tmp = (struct csio_lnode *) tmp1;
+		if (ln_tmp == ln)
+			continue;
+
+		rnhead = (struct csio_rnode *)&ln_tmp->rnhead;
+		list_for_each(tmp, &rnhead->sm.sm_list) {
+
+			rn = (struct csio_rnode *) tmp;
+			if (csio_is_rnode_ready(rn)) {
+				if (rn->flowid == rdev_flowid) {
+					*vnp_flowid = csio_ln_flowid(ln_tmp);
+					return 1;
+				}
+			}
+		}
+	}
+
+	return 0;
+}
+
+static struct csio_rnode *
+csio_alloc_rnode(struct csio_lnode *ln)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(ln);
+
+	struct csio_rnode *rn = mempool_alloc(hw->rnode_mempool, GFP_ATOMIC);
+	if (!rn)
+		goto err;
+
+	memset(rn, 0, sizeof(struct csio_rnode));
+	if (csio_rnode_init(rn, ln))
+		goto err_free;
+
+	CSIO_INC_STATS(ln, n_rnode_alloc);
+
+	return rn;
+
+err_free:
+	mempool_free(rn, hw->rnode_mempool);
+err:
+	CSIO_INC_STATS(ln, n_rnode_nomem);
+	return NULL;
+}
+
+static void
+csio_free_rnode(struct csio_rnode *rn)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(csio_rnode_to_lnode(rn));
+
+	csio_rnode_exit(rn);
+	CSIO_INC_STATS(rn->lnp, n_rnode_free);
+	mempool_free(rn, hw->rnode_mempool);
+}
+
+/*
+ * csio_get_rnode - Gets rnode with the given flowid
+ * @ln - lnode
+ * @flowid - flow id.
+ *
+ * Does the rnode lookup on the given lnode and flowid. If no matching
+ * rnode found, then new rnode with given npid is allocated and returned.
+ */
+static struct csio_rnode *
+csio_get_rnode(struct csio_lnode *ln, uint32_t flowid)
+{
+	struct csio_rnode *rn;
+
+	rn = csio_rn_lookup(ln, flowid);
+	if (!rn) {
+		rn = csio_alloc_rnode(ln);
+		if (!rn)
+			return NULL;
+
+		rn->flowid = flowid;
+	}
+
+	return rn;
+}
+
+/*
+ * csio_put_rnode - Frees the given rnode
+ * @ln - lnode
+ * @flowid - flow id.
+ *
+ * Does the rnode lookup on the given lnode and flowid. If no matching
+ * rnode found, then new rnode with given npid is allocated and returned.
+ */
+void
+csio_put_rnode(struct csio_lnode *ln, struct csio_rnode *rn)
+{
+	CSIO_DB_ASSERT(csio_is_rnode_uninit(rn) != 0);
+	csio_free_rnode(rn);
+}
+
+/*
+ * csio_confirm_rnode - confirms rnode based on wwpn.
+ * @ln: lnode
+ * @rdev_flowid: remote device flowid
+ * @rdevp: remote device params
+ * This routines searches other rnode in list having same wwpn of new rnode.
+ * If there is a match, then matched rnode is returned and otherwise new rnode
+ * is returned.
+ * returns rnode.
+ */
+struct csio_rnode *
+csio_confirm_rnode(struct csio_lnode *ln, uint32_t rdev_flowid,
+		   struct fcoe_rdev_entry *rdevp)
+{
+	uint8_t rport_type;
+	struct csio_rnode *rn, *match_rn;
+	uint32_t vnp_flowid;
+
+	rport_type =
+		FW_RDEV_WR_RPORT_TYPE_GET(rdevp->rd_xfer_rdy_to_rport_type);
+	/* Drop rdev event for cntrl port */
+	if (rport_type == FAB_CTLR_VNPORT) {
+		csio_ln_dbg(ln,
+			    "Unhandled rport_type:%d recv in rdev evt "
+			    "ssni:x%x\n", rport_type, rdev_flowid);
+		return NULL;
+	}
+
+	/* Lookup on flowid */
+	rn = csio_rn_lookup(ln, rdev_flowid);
+	if (!rn) {
+
+		/* Drop events with duplicate flowid */
+		if (csio_rn_dup_flowid(ln, rdev_flowid, &vnp_flowid)) {
+			csio_ln_warn(ln,
+				     "ssni:%x already active on vnpi:%x",
+				     rdev_flowid, vnp_flowid);
+			return NULL;
+		}
+
+		/* skip wwpn lookup for fabric ports, cntrl port */
+		if (rport_type == FLOGI_VFPORT || rport_type == FDISC_VFPORT
+		    || rport_type == FAB_CTLR_VNPORT) {
+			goto alloc_rnode;
+		}
+
+		/* Lookup on wwpn for NPORTs */
+		rn = csio_rn_lookup_wwpn(ln, rdevp->wwpn);
+		if (!rn)
+			goto alloc_rnode;
+
+		/* found rn */
+		goto found_rnode;
+	} else {
+		/* verify rnode found for fabric ports, cntrl port */
+		if (rport_type == FLOGI_VFPORT || rport_type == FDISC_VFPORT
+		    || rport_type == FAB_CTLR_VNPORT) {
+
+			/* Rnode role mismatch. Allocate new rnode */
+			if (rn->role == CSIO_RNFR_NS ||
+			    rn->role == CSIO_RNFR_NPORT) {
+				csio_ln_dbg(ln,
+					"rnode role mismatch found ssni:x%x "
+					"role:%d new_type:%d\n",
+					rdev_flowid, rn->role, rport_type);
+				if (csio_is_rnode_ready(rn)) {
+					csio_ln_warn(ln,
+						     "rnode is already"
+						     "active ssni:x%x\n",
+						     rdev_flowid);
+					CSIO_DB_ASSERT(0);
+				}
+				csio_rn_flowid(rn) = CSIO_INVALID_IDX;
+				goto alloc_rnode;
+			} else
+				goto found_rnode;
+		}
+
+		/* wwpn match */
+		if (!memcmp(csio_rn_wwpn(rn), rdevp->wwpn, 8)) {
+			/* Update rn */
+			goto found_rnode;
+		}
+
+		/* Search for rnode that have same wwpn */
+		match_rn = csio_rn_lookup_wwpn(ln, rdevp->wwpn);
+		if (match_rn != NULL) {
+			csio_ln_dbg(ln,
+				"ssni:x%x changed for rport name(wwpn):%llx "
+				"did:x%x\n", rdev_flowid,
+				wwn_to_u64(rdevp->wwpn),
+				match_rn->nport_id);
+			csio_rn_flowid(rn) = CSIO_INVALID_IDX;
+			rn = match_rn;
+			CSIO_INC_STATS(ln, n_rnode_match);
+		} else {
+			csio_ln_dbg(ln,
+				"rnode wwpn mismatch found ssni:x%x "
+				"name(wwpn):%llx\n",
+				rdev_flowid,
+				wwn_to_u64(csio_rn_wwpn(rn)));
+			if (csio_is_rnode_ready(rn)) {
+				csio_ln_warn(ln,
+					     "rnode is already active "
+					     "wwpn:%llx ssni:x%x\n",
+					     wwn_to_u64(csio_rn_wwpn(rn)),
+					     rdev_flowid);
+				CSIO_DB_ASSERT(0);
+			}
+			csio_rn_flowid(rn) = CSIO_INVALID_IDX;
+			goto alloc_rnode;
+		}
+	}
+
+found_rnode:
+	csio_ln_dbg(ln, "found rnode:%p ssni:x%x name(wwpn):%llx\n",
+		rn, rdev_flowid, wwn_to_u64(rdevp->wwpn));
+
+	/* Update flowid */
+	csio_rn_flowid(rn) = rdev_flowid;
+
+	/* update rdev entry */
+	rn->rdev_entry = rdevp;
+	return rn;
+
+alloc_rnode:
+	rn = csio_get_rnode(ln, rdev_flowid);
+	if (!rn)
+		return NULL;
+
+	csio_ln_dbg(ln, "alloc rnode:%p ssni:x%x name(wwpn):%llx\n",
+		rn, rdev_flowid, wwn_to_u64(rdevp->wwpn));
+
+	/* update rdev entry */
+	rn->rdev_entry = rdevp;
+	return rn;
+}
+
+/*
+ * csio_rn_verify_rparams - verify rparams.
+ * @ln: lnode
+ * @rn: rnode
+ * @rdevp: remote device params
+ * returns success if rparams are verified.
+ */
+static int
+csio_rn_verify_rparams(struct csio_lnode *ln, struct csio_rnode *rn,
+			struct fcoe_rdev_entry *rdevp)
+{
+	uint8_t null[8];
+	uint8_t rport_type;
+	uint8_t fc_class;
+	uint32_t *did;
+
+	did = (uint32_t *) &rdevp->r_id[0];
+	rport_type =
+		FW_RDEV_WR_RPORT_TYPE_GET(rdevp->rd_xfer_rdy_to_rport_type);
+	switch (rport_type) {
+	case FLOGI_VFPORT:
+		rn->role = CSIO_RNFR_FABRIC;
+		if (((ntohl(*did) >> 8) & CSIO_DID_MASK) != FC_FID_FLOGI) {
+			csio_ln_err(ln, "ssni:x%x invalid fabric portid\n",
+				csio_rn_flowid(rn));
+			return -EINVAL;
+		}
+		/* NPIV support */
+		if (FW_RDEV_WR_NPIV_GET(rdevp->vft_to_qos))
+			ln->flags |= CSIO_LNF_NPIVSUPP;
+
+		break;
+
+	case NS_VNPORT:
+		rn->role = CSIO_RNFR_NS;
+		if (((ntohl(*did) >> 8) & CSIO_DID_MASK) != FC_FID_DIR_SERV) {
+			csio_ln_err(ln, "ssni:x%x invalid fabric portid\n",
+				csio_rn_flowid(rn));
+			return -EINVAL;
+		}
+		break;
+
+	case REG_FC4_VNPORT:
+	case REG_VNPORT:
+		rn->role = CSIO_RNFR_NPORT;
+		if (rdevp->event_cause == PRLI_ACC_RCVD ||
+			rdevp->event_cause == PRLI_RCVD) {
+			if (FW_RDEV_WR_TASK_RETRY_ID_GET(
+							rdevp->enh_disc_to_tgt))
+				rn->fcp_flags |= FCP_SPPF_OVLY_ALLOW;
+
+			if (FW_RDEV_WR_RETRY_GET(rdevp->enh_disc_to_tgt))
+				rn->fcp_flags |= FCP_SPPF_RETRY;
+
+			if (FW_RDEV_WR_CONF_CMPL_GET(rdevp->enh_disc_to_tgt))
+				rn->fcp_flags |= FCP_SPPF_CONF_COMPL;
+
+			if (FW_RDEV_WR_TGT_GET(rdevp->enh_disc_to_tgt))
+				rn->role |= CSIO_RNFR_TARGET;
+
+			if (FW_RDEV_WR_INI_GET(rdevp->enh_disc_to_tgt))
+				rn->role |= CSIO_RNFR_INITIATOR;
+		}
+
+		break;
+
+	case FDMI_VNPORT:
+	case FAB_CTLR_VNPORT:
+		rn->role = 0;
+		break;
+
+	default:
+		csio_ln_err(ln, "ssni:x%x invalid rport type recv x%x\n",
+			csio_rn_flowid(rn), rport_type);
+		return -EINVAL;
+	}
+
+	/* validate wwpn/wwnn for Name server/remote port */
+	if (rport_type == REG_VNPORT || rport_type == NS_VNPORT) {
+		memset(null, 0, 8);
+		if (!memcmp(rdevp->wwnn, null, 8)) {
+			csio_ln_err(ln,
+				    "ssni:x%x invalid wwnn received from"
+				    " rport did:x%x\n",
+				    csio_rn_flowid(rn),
+				    (ntohl(*did) & CSIO_DID_MASK));
+			return -EINVAL;
+		}
+
+		if (!memcmp(rdevp->wwpn, null, 8)) {
+			csio_ln_err(ln,
+				    "ssni:x%x invalid wwpn received from"
+				    " rport did:x%x\n",
+				    csio_rn_flowid(rn),
+				    (ntohl(*did) & CSIO_DID_MASK));
+			return -EINVAL;
+		}
+
+	}
+
+	/* Copy wwnn, wwpn and nport id */
+	rn->nport_id = (ntohl(*did) >> 8) & CSIO_DID_MASK;
+	memcpy(csio_rn_wwnn(rn), rdevp->wwnn, 8);
+	memcpy(csio_rn_wwpn(rn), rdevp->wwpn, 8);
+	rn->rn_sparm.csp.sp_bb_data = ntohs(rdevp->rcv_fr_sz);
+	fc_class = FW_RDEV_WR_CLASS_GET(rdevp->vft_to_qos);
+	rn->rn_sparm.clsp[fc_class - 1].cp_class = htons(FC_CPC_VALID);
+	return 0;
+}
+
+static void
+__csio_reg_rnode(struct csio_rnode *rn)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(csio_rnode_to_lnode(rn));
+	struct csio_lnode *ln = csio_rnode_to_lnode(rn);
+
+	spin_unlock_irq(&hw->lock);
+	csio_reg_rnode(rn);
+	spin_lock_irq(&hw->lock);
+
+	if (rn->nport_id == FC_FID_MGMT_SERV)
+		csio_ln_fdmi_start(ln, (void *) rn);
+}
+
+static void
+__csio_unreg_rnode(struct csio_rnode *rn)
+{
+	struct csio_hw *hw = csio_lnode_to_hw(csio_rnode_to_lnode(rn));
+	LIST_HEAD(tmp_q);
+	int cmpl = 0;
+
+	if (!list_empty(&rn->host_cmpl_q)) {
+		csio_dbg(hw, "Returning completion queue I/Os\n");
+		list_splice_tail_init(&rn->host_cmpl_q, &tmp_q);
+		cmpl = 1;
+	}
+
+	spin_unlock_irq(&hw->lock);
+	csio_unreg_rnode(rn);
+	spin_lock_irq(&hw->lock);
+
+	/* Cleanup I/Os that were waiting for rnode to unregister */
+	if (cmpl)
+		csio_scsi_cleanup_io_q(csio_hw_to_scsim(hw), &tmp_q);
+
+}
+
+/*****************************************************************************/
+/* START: Rnode SM                                                           */
+/*****************************************************************************/
+
+/*
+ * csio_rns_uninit -
+ * @rn - rnode
+ * @evt - SM event.
+ *
+ */
+static void
+csio_rns_uninit(struct csio_rnode *rn, enum csio_rn_ev evt)
+{
+	struct csio_lnode *ln = csio_rnode_to_lnode(rn);
+	int ret = 0;
+
+	CSIO_INC_STATS(rn, n_evt_sm[evt]);
+
+	switch (evt) {
+	case CSIO_RNFE_LOGGED_IN:
+	case CSIO_RNFE_PLOGI_RECV:
+		ret = csio_rn_verify_rparams(ln, rn, rn->rdev_entry);
+		if (!ret) {
+			csio_set_state(&rn->sm, csio_rns_ready);
+			__csio_reg_rnode(rn);
+		} else {
+			CSIO_INC_STATS(rn, n_err_inval);
+		}
+		break;
+	case CSIO_RNFE_LOGO_RECV:
+		csio_ln_dbg(ln,
+			    "ssni:x%x Ignoring event %d recv "
+			    "in rn state[uninit]\n", csio_rn_flowid(rn), evt);
+		CSIO_INC_STATS(rn, n_evt_drop);
+		break;
+	default:
+		csio_ln_dbg(ln,
+			    "ssni:x%x unexp event %d recv "
+			    "in rn state[uninit]\n", csio_rn_flowid(rn), evt);
+		CSIO_INC_STATS(rn, n_evt_unexp);
+		break;
+	}
+}
+
+/*
+ * csio_rns_ready -
+ * @rn - rnode
+ * @evt - SM event.
+ *
+ */
+static void
+csio_rns_ready(struct csio_rnode *rn, enum csio_rn_ev evt)
+{
+	struct csio_lnode *ln = csio_rnode_to_lnode(rn);
+	int ret = 0;
+
+	CSIO_INC_STATS(rn, n_evt_sm[evt]);
+
+	switch (evt) {
+	case CSIO_RNFE_LOGGED_IN:
+	case CSIO_RNFE_PLOGI_RECV:
+		csio_ln_dbg(ln,
+			"ssni:x%x Ignoring event %d recv from did:x%x "
+			"in rn state[ready]\n", csio_rn_flowid(rn), evt,
+			rn->nport_id);
+		CSIO_INC_STATS(rn, n_evt_drop);
+		break;
+
+	case CSIO_RNFE_PRLI_DONE:
+	case CSIO_RNFE_PRLI_RECV:
+		ret = csio_rn_verify_rparams(ln, rn, rn->rdev_entry);
+		if (!ret)
+			__csio_reg_rnode(rn);
+		else
+			CSIO_INC_STATS(rn, n_err_inval);
+
+		break;
+	case CSIO_RNFE_DOWN:
+		csio_set_state(&rn->sm, csio_rns_offline);
+		__csio_unreg_rnode(rn);
+
+		/* FW expected to internally aborted outstanding SCSI WRs
+		 * and return all SCSI WRs to host with status "ABORTED".
+		 */
+		break;
+
+	case CSIO_RNFE_LOGO_RECV:
+		csio_set_state(&rn->sm, csio_rns_offline);
+
+		__csio_unreg_rnode(rn);
+
+		/* FW expected to internally aborted outstanding SCSI WRs
+		 * and return all SCSI WRs to host with status "ABORTED".
+		 */
+		break;
+
+	case CSIO_RNFE_CLOSE:
+		/*
+		 * Each rnode receives CLOSE event when driver is removed or
+		 * device is reset
+		 * Note: All outstanding IOs on remote port need to returned
+		 * to uppper layer with appropriate error before sending
+		 * CLOSE event
+		 */
+		csio_set_state(&rn->sm, csio_rns_uninit);
+		__csio_unreg_rnode(rn);
+		break;
+
+	case CSIO_RNFE_NAME_MISSING:
+		csio_set_state(&rn->sm, csio_rns_disappeared);
+		__csio_unreg_rnode(rn);
+
+		/*
+		 * FW expected to internally aborted outstanding SCSI WRs
+		 * and return all SCSI WRs to host with status "ABORTED".
+		 */
+
+		break;
+
+	default:
+		csio_ln_dbg(ln,
+			"ssni:x%x unexp event %d recv from did:x%x "
+			"in rn state[uninit]\n", csio_rn_flowid(rn), evt,
+			rn->nport_id);
+		CSIO_INC_STATS(rn, n_evt_unexp);
+		break;
+	}
+}
+
+/*
+ * csio_rns_offline -
+ * @rn - rnode
+ * @evt - SM event.
+ *
+ */
+static void
+csio_rns_offline(struct csio_rnode *rn, enum csio_rn_ev evt)
+{
+	struct csio_lnode *ln = csio_rnode_to_lnode(rn);
+	int ret = 0;
+
+	CSIO_INC_STATS(rn, n_evt_sm[evt]);
+
+	switch (evt) {
+	case CSIO_RNFE_LOGGED_IN:
+	case CSIO_RNFE_PLOGI_RECV:
+		ret = csio_rn_verify_rparams(ln, rn, rn->rdev_entry);
+		if (!ret) {
+			csio_set_state(&rn->sm, csio_rns_ready);
+			__csio_reg_rnode(rn);
+		} else {
+			CSIO_INC_STATS(rn, n_err_inval);
+			csio_post_event(&rn->sm, CSIO_RNFE_CLOSE);
+		}
+		break;
+
+	case CSIO_RNFE_DOWN:
+		csio_ln_dbg(ln,
+			"ssni:x%x Ignoring event %d recv from did:x%x "
+			"in rn state[offline]\n", csio_rn_flowid(rn), evt,
+			rn->nport_id);
+		CSIO_INC_STATS(rn, n_evt_drop);
+		break;
+
+	case CSIO_RNFE_CLOSE:
+		/* Each rnode receives CLOSE event when driver is removed or
+		 * device is reset
+		 * Note: All outstanding IOs on remote port need to returned
+		 * to uppper layer with appropriate error before sending
+		 * CLOSE event
+		 */
+		csio_set_state(&rn->sm, csio_rns_uninit);
+		break;
+
+	case CSIO_RNFE_NAME_MISSING:
+		csio_set_state(&rn->sm, csio_rns_disappeared);
+		break;
+
+	default:
+		csio_ln_dbg(ln,
+			"ssni:x%x unexp event %d recv from did:x%x "
+			"in rn state[offline]\n", csio_rn_flowid(rn), evt,
+			rn->nport_id);
+		CSIO_INC_STATS(rn, n_evt_unexp);
+		break;
+	}
+}
+
+/*
+ * csio_rns_disappeared -
+ * @rn - rnode
+ * @evt - SM event.
+ *
+ */
+static void
+csio_rns_disappeared(struct csio_rnode *rn, enum csio_rn_ev evt)
+{
+	struct csio_lnode *ln = csio_rnode_to_lnode(rn);
+	int ret = 0;
+
+	CSIO_INC_STATS(rn, n_evt_sm[evt]);
+
+	switch (evt) {
+	case CSIO_RNFE_LOGGED_IN:
+	case CSIO_RNFE_PLOGI_RECV:
+		ret = csio_rn_verify_rparams(ln, rn, rn->rdev_entry);
+		if (!ret) {
+			csio_set_state(&rn->sm, csio_rns_ready);
+			__csio_reg_rnode(rn);
+		} else {
+			CSIO_INC_STATS(rn, n_err_inval);
+			csio_post_event(&rn->sm, CSIO_RNFE_CLOSE);
+		}
+		break;
+
+	case CSIO_RNFE_CLOSE:
+		/* Each rnode receives CLOSE event when driver is removed or
+		 * device is reset.
+		 * Note: All outstanding IOs on remote port need to returned
+		 * to uppper layer with appropriate error before sending
+		 * CLOSE event
+		 */
+		csio_set_state(&rn->sm, csio_rns_uninit);
+		break;
+
+	case CSIO_RNFE_DOWN:
+	case CSIO_RNFE_NAME_MISSING:
+		csio_ln_dbg(ln,
+			"ssni:x%x Ignoring event %d recv from did x%x"
+			"in rn state[disappeared]\n", csio_rn_flowid(rn),
+			evt, rn->nport_id);
+		break;
+
+	default:
+		csio_ln_dbg(ln,
+			"ssni:x%x unexp event %d recv from did x%x"
+			"in rn state[disappeared]\n", csio_rn_flowid(rn),
+			evt, rn->nport_id);
+		CSIO_INC_STATS(rn, n_evt_unexp);
+		break;
+	}
+}
+
+/*****************************************************************************/
+/* END: Rnode SM                                                             */
+/*****************************************************************************/
+
+/*
+ * csio_rnode_devloss_handler - Device loss event handler
+ * @rn: rnode
+ *
+ * Post event to close rnode SM and free rnode.
+ */
+void
+csio_rnode_devloss_handler(struct csio_rnode *rn)
+{
+	struct csio_lnode *ln = csio_rnode_to_lnode(rn);
+
+	/* ignore if same rnode came back as online */
+	if (csio_is_rnode_ready(rn))
+		return;
+
+	csio_post_event(&rn->sm, CSIO_RNFE_CLOSE);
+
+	/* Free rn if in uninit state */
+	if (csio_is_rnode_uninit(rn))
+		csio_put_rnode(ln, rn);
+}
+
+/**
+ * csio_rnode_fwevt_handler - Event handler for firmware rnode events.
+ * @rn:		rnode
+ *
+ */
+void
+csio_rnode_fwevt_handler(struct csio_rnode *rn, uint8_t fwevt)
+{
+	struct csio_lnode *ln = csio_rnode_to_lnode(rn);
+	enum csio_rn_ev evt;
+
+	evt = CSIO_FWE_TO_RNFE(fwevt);
+	if (!evt) {
+		csio_ln_err(ln, "ssni:x%x Unhandled FW Rdev event: %d\n",
+			    csio_rn_flowid(rn), fwevt);
+		CSIO_INC_STATS(rn, n_evt_unexp);
+		return;
+	}
+	CSIO_INC_STATS(rn, n_evt_fw[fwevt]);
+
+	/* Track previous & current events for debugging */
+	rn->prev_evt = rn->cur_evt;
+	rn->cur_evt = fwevt;
+
+	/* Post event to rnode SM */
+	csio_post_event(&rn->sm, evt);
+
+	/* Free rn if in uninit state */
+	if (csio_is_rnode_uninit(rn))
+		csio_put_rnode(ln, rn);
+}
+
+/*
+ * csio_rnode_init - Initialize rnode.
+ * @rn: RNode
+ * @ln: Associated lnode
+ *
+ * Caller is responsible for holding the lock. The lock is required
+ * to be held for inserting the rnode in ln->rnhead list.
+ */
+static int
+csio_rnode_init(struct csio_rnode *rn, struct csio_lnode *ln)
+{
+	csio_rnode_to_lnode(rn) = ln;
+	csio_init_state(&rn->sm, csio_rns_uninit);
+	INIT_LIST_HEAD(&rn->host_cmpl_q);
+	csio_rn_flowid(rn) = CSIO_INVALID_IDX;
+
+	/* Add rnode to list of lnodes->rnhead */
+	list_add_tail(&rn->sm.sm_list, &ln->rnhead);
+
+	return 0;
+}
+
+static void
+csio_rnode_exit(struct csio_rnode *rn)
+{
+	list_del_init(&rn->sm.sm_list);
+	CSIO_DB_ASSERT(list_empty(&rn->host_cmpl_q));
+}
-- 
1.7.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox