Netdev List
 help / color / mirror / Atom feed
* [PATCHv2] Socket filter ancilliary data access for skb->dev->type
From: Paul LeoNerd Evans @ 2010-04-22 13:32 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]

Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving
access to skb->dev->type, as reported in the sll_hatype field.

When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
interfaces, there doesn't appear to be a way for the filter program to
actually find out the underlying hardware type the packet was captured
on. This patch adds such ability.

This patch also handles the case where skb->dev can be NULL, such as on
netlink sockets.

Signed-off-by: Paul Evans <leonerd@leonerd.org.uk>

---

diff -ur linux-2.6.33.2.orig/include/linux/filter.h linux-2.6.33.2/include/linux/filter.h
--- linux-2.6.33.2.orig/include/linux/filter.h	2010-04-02 00:02:33.000000000 +0100
+++ linux-2.6.33.2/include/linux/filter.h	2010-04-20 22:40:25.000000000 +0100
@@ -123,7 +123,8 @@
 #define SKF_AD_NLATTR_NEST	16
 #define SKF_AD_MARK 	20
 #define SKF_AD_QUEUE	24
-#define SKF_AD_MAX	28
+#define SKF_AD_HATYPE	28
+#define SKF_AD_MAX	32
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff -ur linux-2.6.33.2.orig/net/core/filter.c linux-2.6.33.2/net/core/filter.c
--- linux-2.6.33.2.orig/net/core/filter.c	2010-04-02 00:02:33.000000000 +0100
+++ linux-2.6.33.2/net/core/filter.c	2010-04-22 14:19:24.000000000 +0100
@@ -301,6 +301,8 @@
 			A = skb->pkt_type;
 			continue;
 		case SKF_AD_IFINDEX:
+			if (!skb->dev)
+				return 0;
 			A = skb->dev->ifindex;
 			continue;
 		case SKF_AD_MARK:
@@ -309,6 +311,11 @@
 		case SKF_AD_QUEUE:
 			A = skb->queue_mapping;
 			continue;
+		case SKF_AD_HATYPE:
+			if (!skb->dev)
+				return 0;
+			A = skb->dev->type;
+			continue;
 		case SKF_AD_NLATTR: {
 			struct nlattr *nla;
 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: DDoS attack causing bad effect on conntrack searches
From: Patrick McHardy @ 2010-04-22 13:17 UTC (permalink / raw)
  To: Changli Gao
  Cc: hawk, Eric Dumazet, Linux Kernel Network Hackers, netfilter-devel,
	Paul E McKenney
In-Reply-To: <q2h412e6f7f1004220613m488c2ee4r6d24a8d1e65997d4@mail.gmail.com>

Changli Gao wrote:
> On Thu, Apr 22, 2010 at 8:58 PM, Jesper Dangaard Brouer <hawk@comx.dk> wrote:
>> At an unnamed ISP, we experienced a DDoS attack against one of our
>> customers.  This attack also caused problems for one of our Linux
>> based routers.
>>
>> The attack was "only" generating 300 kpps (packets per sec), which
>> usually isn't a problem for this (fairly old) Linux Router.  But the
>> conntracking system chocked and reduced pps processing power to
>> 40kpps.
>>
>> I do extensive RRD/graph monitoring of the machines.  The IP conntrack
>> searches in the period exploded, to a stunning 700.000 searches per
>> sec.
>>
>> http://people.netfilter.org/hawk/DDoS/2010-04-12__001/conntrack_searches001.png
>>
>> First I though it might be caused by bad hashing, but after reading
>> the kernel code (func: __nf_conntrack_find()), I think its caused by
>> the loop restart (goto begin) of the conntrack search, running under
>> local_bh_disable().  These RCU changes to conntrack were introduced in
>> ea781f19 by Eric Dumazet.
>>
>> Code: net/netfilter/nf_conntrack_core.c
>> Func: __nf_conntrack_find()
>>
>> struct nf_conntrack_tuple_hash *
>> __nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple *tuple)
>> {
>>        struct nf_conntrack_tuple_hash *h;
>>        struct hlist_nulls_node *n;
>>        unsigned int hash = hash_conntrack(tuple);
>>
>>        /* Disable BHs the entire time since we normally need to disable them
>>         * at least once for the stats anyway.
>>         */
>>        local_bh_disable();
>> begin:
>>        hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) {
>>                if (nf_ct_tuple_equal(tuple, &h->tuple)) {
>>                        NF_CT_STAT_INC(net, found);
>>                        local_bh_enable();
>>                        return h;
>>                }
>>                NF_CT_STAT_INC(net, searched);
>>        }
>>        /*
>>         * if the nulls value we got at the end of this lookup is
>>         * not the expected one, we must restart lookup.
>>         * We probably met an item that was moved to another chain.
>>         */
>>        if (get_nulls_value(n) != hash)
>>                goto begin;
>>        local_bh_enable();
>>
> 
> We should add a retry limit there.

We can't do that since that would allow false negatives.

^ permalink raw reply

* Re: DDoS attack causing bad effect on conntrack searches
From: Changli Gao @ 2010-04-22 13:13 UTC (permalink / raw)
  To: hawk
  Cc: Eric Dumazet, Patrick McHardy, Linux Kernel Network Hackers,
	netfilter-devel, Paul E McKenney
In-Reply-To: <1271941082.14501.189.camel@jdb-workstation>

On Thu, Apr 22, 2010 at 8:58 PM, Jesper Dangaard Brouer <hawk@comx.dk> wrote:
>
> At an unnamed ISP, we experienced a DDoS attack against one of our
> customers.  This attack also caused problems for one of our Linux
> based routers.
>
> The attack was "only" generating 300 kpps (packets per sec), which
> usually isn't a problem for this (fairly old) Linux Router.  But the
> conntracking system chocked and reduced pps processing power to
> 40kpps.
>
> I do extensive RRD/graph monitoring of the machines.  The IP conntrack
> searches in the period exploded, to a stunning 700.000 searches per
> sec.
>
> http://people.netfilter.org/hawk/DDoS/2010-04-12__001/conntrack_searches001.png
>
> First I though it might be caused by bad hashing, but after reading
> the kernel code (func: __nf_conntrack_find()), I think its caused by
> the loop restart (goto begin) of the conntrack search, running under
> local_bh_disable().  These RCU changes to conntrack were introduced in
> ea781f19 by Eric Dumazet.
>
> Code: net/netfilter/nf_conntrack_core.c
> Func: __nf_conntrack_find()
>
> struct nf_conntrack_tuple_hash *
> __nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple *tuple)
> {
>        struct nf_conntrack_tuple_hash *h;
>        struct hlist_nulls_node *n;
>        unsigned int hash = hash_conntrack(tuple);
>
>        /* Disable BHs the entire time since we normally need to disable them
>         * at least once for the stats anyway.
>         */
>        local_bh_disable();
> begin:
>        hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) {
>                if (nf_ct_tuple_equal(tuple, &h->tuple)) {
>                        NF_CT_STAT_INC(net, found);
>                        local_bh_enable();
>                        return h;
>                }
>                NF_CT_STAT_INC(net, searched);
>        }
>        /*
>         * if the nulls value we got at the end of this lookup is
>         * not the expected one, we must restart lookup.
>         * We probably met an item that was moved to another chain.
>         */
>        if (get_nulls_value(n) != hash)
>                goto begin;
>        local_bh_enable();
>

We should add a retry limit there.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Socket filter ancilliary data access for skb->dev->type
From: Patrick McHardy @ 2010-04-22 13:13 UTC (permalink / raw)
  To: Paul LeoNerd Evans; +Cc: netdev
In-Reply-To: <20100422131105.GS19334@cel.leo>

Paul LeoNerd Evans wrote:
> On Thu, Apr 22, 2010 at 02:28:46PM +0200, Patrick McHardy wrote:
>> I think we should be adding a check whether skb->dev is non-NULL here
>> since filters can also be attached to netlink sockets. The same applies
>> to SKF_AD_IFINDEX.
> 
> What should the appropriate behaviour be here? Set A to some rogue value
> - 0 or -1 seem appropriate? Or, abort the filter entirely (such as in
> e.g. divide-by-zero, or invalid memory buffer access)?
> 
> Either way that sounds simple enough, I can hack that in and resubmit.

I'd say we should abort execution.

^ permalink raw reply

* Re: [PATCH] Socket filter ancilliary data access for skb->dev->type
From: Paul LeoNerd Evans @ 2010-04-22 13:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev
In-Reply-To: <4BD040FE.3000809@trash.net>

[-- Attachment #1: Type: text/plain, Size: 644 bytes --]

On Thu, Apr 22, 2010 at 02:28:46PM +0200, Patrick McHardy wrote:
> I think we should be adding a check whether skb->dev is non-NULL here
> since filters can also be attached to netlink sockets. The same applies
> to SKF_AD_IFINDEX.

What should the appropriate behaviour be here? Set A to some rogue value
- 0 or -1 seem appropriate? Or, abort the filter entirely (such as in
e.g. divide-by-zero, or invalid memory buffer access)?

Either way that sounds simple enough, I can hack that in and resubmit.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: [PATCH v5] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22 13:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, jamal, Tom Herbert, netdev
In-Reply-To: <1271936227.7895.5285.camel@edumazet-laptop>

On Thu, Apr 22, 2010 at 7:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Please reorder things better.
>
> Most likely this function is called for one packet.
>
> In your version you take twice the rps_lock()/rps_unlock() path, so
> it'll be slower.
>
> Once to 'transfert' one list to process list
>
> Once to be able to do the 'label out:' post processing.
>

It doesn't make any difference. We have to hold rps_lock to update
input_pkt_queue_len, if we don't use another variable to record the
length of the process queue, or atomic variable.

I think it is better that using another variable to record the length
of the process queue, and updating it before process_backlog()
returns. For one packet, there is only one locking/unlocking. There is
only one issue you concerned: cache miss due to sum the two queues'
length. I'll change softnet_data to:

struct softnet_data {
        struct Qdisc            *output_queue;
        struct list_head        poll_list;
        struct sk_buff          *completion_queue;
        struct sk_buff_head     process_queue;

#ifdef CONFIG_RPS
        struct softnet_data     *rps_ipi_list;

        /* Elements below can be accessed between CPUs for RPS */
        struct call_single_data csd ____cacheline_aligned_in_smp;
        struct softnet_data     *rps_ipi_next;
        unsigned int            cpu;
        unsigned int            input_queue_head;
#endif
        unsigned int            process_queue_len;
        struct sk_buff_head     input_pkt_queue;
        struct napi_struct      backlog;
};

For one packets, we have to update process_queue_len in any way. For
more packets, we only change process_queue_len just before
process_backlog() returns. It means that process_queue_len change is
batched.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* DDoS attack causing bad effect on conntrack searches
From: Jesper Dangaard Brouer @ 2010-04-22 12:58 UTC (permalink / raw)
  To: Eric Dumazet, Patrick McHardy
  Cc: Linux Kernel Network Hackers, netfilter-devel, Paul E McKenney


At an unnamed ISP, we experienced a DDoS attack against one of our
customers.  This attack also caused problems for one of our Linux
based routers.

The attack was "only" generating 300 kpps (packets per sec), which
usually isn't a problem for this (fairly old) Linux Router.  But the
conntracking system chocked and reduced pps processing power to
40kpps.

I do extensive RRD/graph monitoring of the machines.  The IP conntrack
searches in the period exploded, to a stunning 700.000 searches per
sec.

http://people.netfilter.org/hawk/DDoS/2010-04-12__001/conntrack_searches001.png

First I though it might be caused by bad hashing, but after reading
the kernel code (func: __nf_conntrack_find()), I think its caused by
the loop restart (goto begin) of the conntrack search, running under
local_bh_disable().  These RCU changes to conntrack were introduced in
ea781f19 by Eric Dumazet.

Code: net/netfilter/nf_conntrack_core.c
Func: __nf_conntrack_find()

struct nf_conntrack_tuple_hash *
__nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple *tuple)
{
	struct nf_conntrack_tuple_hash *h;
	struct hlist_nulls_node *n;
	unsigned int hash = hash_conntrack(tuple);

	/* Disable BHs the entire time since we normally need to disable them
	 * at least once for the stats anyway.
	 */
	local_bh_disable();
begin:
	hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) {
		if (nf_ct_tuple_equal(tuple, &h->tuple)) {
			NF_CT_STAT_INC(net, found);
			local_bh_enable();
			return h;
		}
		NF_CT_STAT_INC(net, searched);
	}
	/*
	 * if the nulls value we got at the end of this lookup is
	 * not the expected one, we must restart lookup.
	 * We probably met an item that was moved to another chain.
	 */
	if (get_nulls_value(n) != hash)
		goto begin;
	local_bh_enable();

	return NULL;
}

>From the graphs:
 http://people.netfilter.org/hawk/DDoS/2010-04-12__001/list.html

Its possible to see, that the problems are most likely caused by the
number of conntrack elements being deleted.

http://people.netfilter.org/hawk/DDoS/2010-04-12__001/conntrack_delete001.png

If you look closely at the graphs, you should be able to see, that
CPU1 is doing all the conntrack "searches", and CPU2 is doing most of
the conntrack "deletes" (and CPU1 is creating a lot of new entries).


The question is, how do we avoid this unfortunately behavior of the
delete process disturbing the search process (causing it into
looping)?


-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network Kernel Developer
  Cand. Scient Datalog / MSc.CS
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer


Extra info: Conntrack tuning
-----------
I have tuned the conntrack system on these hosts.  Firstly I have
increased the number of hash buckets for the conntrack system to
around 300.000.

 cat /sys/module/nf_conntrack/parameters/hashsize
 300032

Next I have increased the max conntracking elements to 900.000.

 cat /proc/sys/net/nf_conntrack_max
 900000




^ permalink raw reply

* [PATCH NEXT 7/8] qlcnic: protect resource access
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

We do netif_device_attach, even if resource allocation fails.
Driver callbacks can be called, if device is attached.
All these callbacks need to be protected by ADAPTER_UP_MAGIC check.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_main.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index bfc5510..ee573fe 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -208,6 +208,9 @@ qlcnic_napi_enable(struct qlcnic_adapter *adapter)
 	struct qlcnic_host_sds_ring *sds_ring;
 	struct qlcnic_recv_context *recv_ctx = &adapter->recv_ctx;
 
+	if (adapter->is_up != QLCNIC_ADAPTER_UP_MAGIC)
+		return;
+
 	for (ring = 0; ring < adapter->max_sds_rings; ring++) {
 		sds_ring = &recv_ctx->sds_rings[ring];
 		napi_enable(&sds_ring->napi);
@@ -222,6 +225,9 @@ qlcnic_napi_disable(struct qlcnic_adapter *adapter)
 	struct qlcnic_host_sds_ring *sds_ring;
 	struct qlcnic_recv_context *recv_ctx = &adapter->recv_ctx;
 
+	if (adapter->is_up != QLCNIC_ADAPTER_UP_MAGIC)
+		return;
+
 	for (ring = 0; ring < adapter->max_sds_rings; ring++) {
 		sds_ring = &recv_ctx->sds_rings[ring];
 		qlcnic_disable_int(sds_ring);
@@ -1573,6 +1579,11 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	int frag_count, no_of_desc;
 	u32 num_txd = tx_ring->num_desc;
 
+	if (!test_bit(__QLCNIC_DEV_UP, &adapter->state)) {
+		netif_stop_queue(netdev);
+		return NETDEV_TX_BUSY;
+	}
+
 	frag_count = skb_shinfo(skb)->nr_frags + 1;
 
 	/* 4 fragments per cmd des */
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 6/8] qlcnic: fix rcv buffer leak
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

Rcv producer value should be read in spin-lock.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_init.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 9ef9f58..1b621ca 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -1554,9 +1554,10 @@ qlcnic_post_rx_buffers(struct qlcnic_adapter *adapter, u32 ringid,
 	int producer, count = 0;
 	struct list_head *head;
 
+	spin_lock(&rds_ring->lock);
+
 	producer = rds_ring->producer;
 
-	spin_lock(&rds_ring->lock);
 	head = &rds_ring->free_list;
 	while (!list_empty(head)) {
 
@@ -1578,13 +1579,13 @@ qlcnic_post_rx_buffers(struct qlcnic_adapter *adapter, u32 ringid,
 
 		producer = get_next_index(producer, rds_ring->num_desc);
 	}
-	spin_unlock(&rds_ring->lock);
 
 	if (count) {
 		rds_ring->producer = producer;
 		writel((producer-1) & (rds_ring->num_desc-1),
 				rds_ring->crb_rcv_producer);
 	}
+	spin_unlock(&rds_ring->lock);
 }
 
 static void
@@ -1596,10 +1597,11 @@ qlcnic_post_rx_buffers_nodb(struct qlcnic_adapter *adapter,
 	int producer, count = 0;
 	struct list_head *head;
 
-	producer = rds_ring->producer;
 	if (!spin_trylock(&rds_ring->lock))
 		return;
 
+	producer = rds_ring->producer;
+
 	head = &rds_ring->free_list;
 	while (!list_empty(head)) {
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 8/8] qlcnic: update version 5.0.2
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

Update version to indicate IDC(fw recovery) changes.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 6c1da71..2fba9cd 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -51,8 +51,8 @@
 
 #define _QLCNIC_LINUX_MAJOR 5
 #define _QLCNIC_LINUX_MINOR 0
-#define _QLCNIC_LINUX_SUBVERSION 1
-#define QLCNIC_LINUX_VERSIONID  "5.0.1"
+#define _QLCNIC_LINUX_SUBVERSION 2
+#define QLCNIC_LINUX_VERSIONID  "5.0.2"
 
 #define QLCNIC_VERSION_CODE(a, b, c)	(((a) << 24) + ((b) << 16) + (c))
 #define _major(v)	(((v) >> 24) & 0xff)
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 3/8] qlcnic: fix fw initialization responsibility
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

Now any pci-func can start fw, whoever sees the reset ack first.
Before this, pci-func which sets the RESET state has the responsibility
to start fw.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_main.c |   66 +++++++++++++++++++++----------------
 1 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index ff7705b..0634990 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2015,6 +2015,7 @@ err:
 	clear_bit(__QLCNIC_RESETTING, &adapter->state);
 }
 
+/* Grab api lock, before checking state */
 static int
 qlcnic_check_drv_state(struct qlcnic_adapter *adapter)
 {
@@ -2037,6 +2038,9 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 	u8 dev_init_timeo = adapter->dev_init_timeo;
 	int portnum = adapter->portnum;
 
+	if (test_and_clear_bit(__QLCNIC_START_FW, &adapter->state))
+		return 1;
+
 	if (qlcnic_api_lock(adapter))
 		return -1;
 
@@ -2044,8 +2048,6 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 	if (!(val & ((int)0x1 << (portnum * 4)))) {
 		val |= ((u32)0x1 << (portnum * 4));
 		QLCWR32(adapter, QLCNIC_CRB_DEV_REF_COUNT, val);
-	} else if (test_and_clear_bit(__QLCNIC_START_FW, &adapter->state)) {
-		goto start_fw;
 	}
 
 	prev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
@@ -2053,7 +2055,6 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 
 	switch (prev_state) {
 	case QLCNIC_DEV_COLD:
-start_fw:
 		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_INITIALIZING);
 		qlcnic_api_unlock(adapter);
 		return 1;
@@ -2113,51 +2114,59 @@ qlcnic_fwinit_work(struct work_struct *work)
 {
 	struct qlcnic_adapter *adapter = container_of(work,
 			struct qlcnic_adapter, fw_work.work);
-	int dev_state;
+	u32 dev_state = 0xf;
 
-	if (test_bit(__QLCNIC_START_FW, &adapter->state)) {
+	if (qlcnic_api_lock(adapter))
+		goto err_ret;
 
-		if (qlcnic_check_drv_state(adapter) &&
-			(adapter->fw_wait_cnt++ < adapter->reset_ack_timeo)) {
-			qlcnic_schedule_work(adapter,
-					qlcnic_fwinit_work, FW_POLL_DELAY);
-			return;
+	if (adapter->fw_wait_cnt++ > adapter->reset_ack_timeo) {
+		dev_err(&adapter->pdev->dev, "Reset:Failed to get ack %d sec\n",
+					adapter->reset_ack_timeo);
+		goto skip_ack_check;
+	}
+
+	if (!qlcnic_check_drv_state(adapter)) {
+skip_ack_check:
+		dev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
+		if (dev_state == QLCNIC_DEV_NEED_RESET) {
+			QLCWR32(adapter, QLCNIC_CRB_DEV_STATE,
+						QLCNIC_DEV_INITIALIZING);
+			set_bit(__QLCNIC_START_FW, &adapter->state);
+			QLCDB(adapter, DRV, "Restarting fw\n");
 		}
 
-		QLCDB(adapter, DRV, "Resetting FW\n");
+		qlcnic_api_unlock(adapter);
+
 		if (!qlcnic_start_firmware(adapter)) {
 			qlcnic_schedule_work(adapter, qlcnic_attach_work, 0);
 			return;
 		}
-
 		goto err_ret;
 	}
 
-	if (adapter->fw_wait_cnt++ > (adapter->dev_init_timeo / 2)) {
-		dev_err(&adapter->pdev->dev,
-				"Waiting for device to reset timeout\n");
-		goto err_ret;
-	}
+	qlcnic_api_unlock(adapter);
 
 	dev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
-	QLCDB(adapter, HW, "Func waiting: Device state=%d\n", dev_state);
+	QLCDB(adapter, HW, "Func waiting: Device state=%u\n", dev_state);
 
 	switch (dev_state) {
-	case QLCNIC_DEV_READY:
-		if (!qlcnic_start_firmware(adapter)) {
-			qlcnic_schedule_work(adapter, qlcnic_attach_work, 0);
-			return;
-		}
+	case QLCNIC_DEV_NEED_RESET:
+		qlcnic_schedule_work(adapter,
+			qlcnic_fwinit_work, FW_POLL_DELAY);
+		return;
 	case QLCNIC_DEV_FAILED:
 		break;
 
 	default:
-		qlcnic_schedule_work(adapter,
-			qlcnic_fwinit_work, 2 * FW_POLL_DELAY);
-		return;
+		if (!qlcnic_start_firmware(adapter)) {
+			qlcnic_schedule_work(adapter, qlcnic_attach_work, 0);
+			return;
+		}
 	}
 
 err_ret:
+	dev_err(&adapter->pdev->dev, "Fwinit work failed state=%u "
+		"fw_wait_cnt=%u\n", dev_state, adapter->fw_wait_cnt);
 	netif_device_attach(adapter->netdev);
 	qlcnic_clr_all_drv_state(adapter);
 }
@@ -2202,6 +2211,7 @@ err_ret:
 
 }
 
+/*Transit to RESET state from READY state only */
 static void
 qlcnic_dev_request_reset(struct qlcnic_adapter *adapter)
 {
@@ -2212,10 +2222,8 @@ qlcnic_dev_request_reset(struct qlcnic_adapter *adapter)
 
 	state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
 
-	if (state != QLCNIC_DEV_INITIALIZING &&
-					state != QLCNIC_DEV_NEED_RESET) {
+	if (state == QLCNIC_DEV_READY) {
 		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_NEED_RESET);
-		set_bit(__QLCNIC_START_FW, &adapter->state);
 		QLCDB(adapter, DRV, "NEED_RESET state set\n");
 	}
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 2/8] qlcnic: fix defines as per IDC document
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Anirban Chakraborty
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

Different class of drivers co-exist for CNA device,
there is some minimal interaction that will be required amongst
the drivers for performing some device level operations.

All the driver should follow inter driver coexistence document.

Fixing polling interval and spelling mistake.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_hdr.h  |   22 +++++++++++-----------
 drivers/net/qlcnic/qlcnic_main.c |    9 +++++++--
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index 51fa3fb..8285a06 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -694,17 +694,18 @@ enum {
 #define QLCNIC_CRB_DRV_STATE               (QLCNIC_CAM_RAM(0x144))
 #define QLCNIC_CRB_DRV_SCRATCH             (QLCNIC_CAM_RAM(0x148))
 #define QLCNIC_CRB_DEV_PARTITION_INFO      (QLCNIC_CAM_RAM(0x14c))
-#define QLCNIC_CRB_DRV_IDC_VER             (QLCNIC_CAM_RAM(0x14c))
+#define QLCNIC_CRB_DRV_IDC_VER		(QLCNIC_CAM_RAM(0x174))
 #define QLCNIC_ROM_DEV_INIT_TIMEOUT	(0x3e885c)
 #define QLCNIC_ROM_DRV_RESET_TIMEOUT	(0x3e8860)
 
-		 /* Device State */
-#define QLCNIC_DEV_COLD 		1
-#define QLCNIC_DEV_INITALIZING		2
-#define QLCNIC_DEV_READY		3
-#define QLCNIC_DEV_NEED_RESET		4
-#define QLCNIC_DEV_NEED_QUISCENT	5
-#define QLCNIC_DEV_FAILED		6
+/* Device State */
+#define QLCNIC_DEV_COLD			0x1
+#define QLCNIC_DEV_INITIALIZING		0x2
+#define QLCNIC_DEV_READY		0x3
+#define QLCNIC_DEV_NEED_RESET		0x4
+#define QLCNIC_DEV_NEED_QUISCENT	0x5
+#define QLCNIC_DEV_FAILED		0x6
+#define QLCNIC_DEV_QUISCENT		0x7
 
 #define QLCNIC_RCODE_DRIVER_INFO		0x20000000
 #define QLCNIC_RCODE_DRIVER_CAN_RELOAD		0x40000000
@@ -712,9 +713,8 @@ enum {
 #define QLCNIC_FWERROR_PEGNUM(code)		((code) & 0xff)
 #define QLCNIC_FWERROR_CODE(code)		((code >> 8) & 0xfffff)
 
-#define FW_POLL_DELAY			(2 * HZ)
-#define FW_FAIL_THRESH			3
-#define FW_POLL_THRESH			10
+#define FW_POLL_DELAY		(1 * HZ)
+#define FW_FAIL_THRESH		2
 
 #define	ISR_MSI_INT_TRIGGER(FUNC) (QLCNIC_PCIX_PS_REG(PCIX_MSI_F(FUNC)))
 #define ISR_LEGACY_INT_TRIGGERED(VAL)	(((VAL) & 0x300) == 0x200)
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 5845dc0..ff7705b 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -2054,7 +2054,7 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 	switch (prev_state) {
 	case QLCNIC_DEV_COLD:
 start_fw:
-		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_INITALIZING);
+		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_INITIALIZING);
 		qlcnic_api_unlock(adapter);
 		return 1;
 
@@ -2077,6 +2077,10 @@ start_fw:
 	case QLCNIC_DEV_FAILED:
 		qlcnic_api_unlock(adapter);
 		return -1;
+
+	case QLCNIC_DEV_INITIALIZING:
+	case QLCNIC_DEV_QUISCENT:
+		break;
 	}
 
 	qlcnic_api_unlock(adapter);
@@ -2208,7 +2212,8 @@ qlcnic_dev_request_reset(struct qlcnic_adapter *adapter)
 
 	state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
 
-	if (state != QLCNIC_DEV_INITALIZING && state != QLCNIC_DEV_NEED_RESET) {
+	if (state != QLCNIC_DEV_INITIALIZING &&
+					state != QLCNIC_DEV_NEED_RESET) {
 		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_NEED_RESET);
 		set_bit(__QLCNIC_START_FW, &adapter->state);
 		QLCDB(adapter, DRV, "NEED_RESET state set\n");
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 1/8] qlcnic: additional driver statistics
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

Added additional driver statistics to track errors in rcv/tx path.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h         |    4 ++++
 drivers/net/qlcnic/qlcnic_ethtool.c |    8 ++++++++
 drivers/net/qlcnic/qlcnic_init.c    |    7 ++++++-
 drivers/net/qlcnic/qlcnic_main.c    |    4 +++-
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 28c148c..6c1da71 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -428,6 +428,10 @@ struct qlcnic_adapter_stats {
 	u64  xmit_on;
 	u64  xmit_off;
 	u64  skb_alloc_failure;
+	u64  null_skb;
+	u64  null_rxbuf;
+	u64  rx_dma_map_error;
+	u64  tx_dma_map_error;
 };
 
 /*
diff --git a/drivers/net/qlcnic/qlcnic_ethtool.c b/drivers/net/qlcnic/qlcnic_ethtool.c
index 08d6f10..6cdc5eb 100644
--- a/drivers/net/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/qlcnic/qlcnic_ethtool.c
@@ -69,6 +69,14 @@ static const struct qlcnic_stats qlcnic_gstrings_stats[] = {
 		QLC_SIZEOF(stats.xmit_off), QLC_OFF(stats.xmit_off)},
 	{"skb_alloc_failure", QLC_SIZEOF(stats.skb_alloc_failure),
 		QLC_OFF(stats.skb_alloc_failure)},
+	{"null skb",
+		QLC_SIZEOF(stats.null_skb), QLC_OFF(stats.null_skb)},
+	{"null rxbuf",
+		QLC_SIZEOF(stats.null_rxbuf), QLC_OFF(stats.null_rxbuf)},
+	{"rx dma map error", QLC_SIZEOF(stats.rx_dma_map_error),
+					 QLC_OFF(stats.rx_dma_map_error)},
+	{"tx dma map error", QLC_SIZEOF(stats.tx_dma_map_error),
+					 QLC_OFF(stats.tx_dma_map_error)},
 
 };
 
diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 01ce74e..9ef9f58 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -1287,6 +1287,7 @@ qlcnic_alloc_rx_skb(struct qlcnic_adapter *adapter,
 			rds_ring->dma_size, PCI_DMA_FROMDEVICE);
 
 	if (pci_dma_mapping_error(pdev, dma)) {
+		adapter->stats.rx_dma_map_error++;
 		dev_kfree_skb_any(skb);
 		buffer->skb = NULL;
 		return -ENOMEM;
@@ -1311,8 +1312,10 @@ static struct sk_buff *qlcnic_process_rxbuf(struct qlcnic_adapter *adapter,
 			PCI_DMA_FROMDEVICE);
 
 	skb = buffer->skb;
-	if (!skb)
+	if (!skb) {
+		adapter->stats.null_skb++;
 		goto no_skb;
+	}
 
 	if (likely(adapter->rx_csum && cksum == STATUS_CKSUM_OK)) {
 		adapter->stats.csummed++;
@@ -1502,6 +1505,8 @@ qlcnic_process_rcv_ring(struct qlcnic_host_sds_ring *sds_ring, int max)
 
 		if (rxbuf)
 			list_add_tail(&rxbuf->list, &sds_ring->free_list[ring]);
+		else
+			adapter->stats.null_rxbuf++;
 
 skip:
 		for (; desc_cnt > 0; desc_cnt--) {
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index e4fd5dc..5845dc0 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1589,8 +1589,10 @@ qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 
 	pdev = adapter->pdev;
 
-	if (qlcnic_map_tx_skb(pdev, skb, pbuf))
+	if (qlcnic_map_tx_skb(pdev, skb, pbuf)) {
+		adapter->stats.tx_dma_map_error++;
 		goto drop_packet;
+	}
 
 	pbuf->skb = skb;
 	pbuf->frag_count = frag_count;
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 0/8]qlcnic: inter driver coexistence fixes
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman

Hi
   Different class of drivers co-exist for CNA device, there is some
   minimal interaction that will be required amongst the drivers for 
   performing some device level operations. Operations such as 
   device initialization, device recovery and device diagnostics test,
   can potentially affect other function drivers.
 
   All these drivers should follow common protocol(IDC) to do such operation.

   Fixing qlcnic driver according to Inter driver coexistence.
   
   Sending series of 8 patches, please apply them in net-next.

-Amit 

^ permalink raw reply

* [PATCH NEXT 4/8] qlcnic: define macro for driver state
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

Defining macro to set and clear driver state.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_hdr.h  |    6 ++++++
 drivers/net/qlcnic/qlcnic_main.c |   22 +++++++++++-----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index 8285a06..a984cd2 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -707,6 +707,12 @@ enum {
 #define QLCNIC_DEV_FAILED		0x6
 #define QLCNIC_DEV_QUISCENT		0x7
 
+#define QLC_DEV_SET_REF_CNT(VAL, FN)		((VAL) |= (1 << (FN * 4)))
+#define QLC_DEV_CLR_REF_CNT(VAL, FN)		((VAL) &= ~(1 << (FN * 4)))
+#define QLC_DEV_SET_RST_RDY(VAL, FN)		((VAL) |= (1 << (FN * 4)))
+#define QLC_DEV_SET_QSCNT_RDY(VAL, FN)		((VAL) |= (2 << (FN * 4)))
+#define QLC_DEV_CLR_RST_QSCNT(VAL, FN)		((VAL) &= ~(3 << (FN * 4)))
+
 #define QLCNIC_RCODE_DRIVER_INFO		0x20000000
 #define QLCNIC_RCODE_DRIVER_CAN_RELOAD		0x40000000
 #define QLCNIC_RCODE_FATAL_ERROR		0x80000000
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 0634990..3c8a963 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1963,9 +1963,9 @@ qlcnic_set_drv_state(struct qlcnic_adapter *adapter, int state)
 	val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
 
 	if (state == QLCNIC_DEV_NEED_RESET)
-		val |= ((u32)0x1 << (adapter->portnum * 4));
+		QLC_DEV_SET_RST_RDY(val, adapter->portnum);
 	else if (state == QLCNIC_DEV_NEED_QUISCENT)
-		val |= ((u32)0x1 << ((adapter->portnum * 4) + 1));
+		QLC_DEV_SET_QSCNT_RDY(val, adapter->portnum);
 
 	QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 
@@ -1981,7 +1981,7 @@ qlcnic_clr_drv_state(struct qlcnic_adapter *adapter)
 		return -EBUSY;
 
 	val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
-	val &= ~((u32)0x3 << (adapter->portnum * 4));
+	QLC_DEV_CLR_RST_QSCNT(val, adapter->portnum);
 	QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 
 	qlcnic_api_unlock(adapter);
@@ -1998,14 +1998,14 @@ qlcnic_clr_all_drv_state(struct qlcnic_adapter *adapter)
 		goto err;
 
 	val = QLCRD32(adapter, QLCNIC_CRB_DEV_REF_COUNT);
-	val &= ~((u32)0x1 << (adapter->portnum * 4));
+	QLC_DEV_CLR_REF_CNT(val, adapter->portnum);
 	QLCWR32(adapter, QLCNIC_CRB_DEV_REF_COUNT, val);
 
 	if (!(val & 0x11111111))
 		QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_COLD);
 
 	val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
-	val &= ~((u32)0x3 << (adapter->portnum * 4));
+	QLC_DEV_CLR_RST_QSCNT(val, adapter->portnum);
 	QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 
 	qlcnic_api_unlock(adapter);
@@ -2036,7 +2036,7 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 {
 	u32 val, prev_state;
 	u8 dev_init_timeo = adapter->dev_init_timeo;
-	int portnum = adapter->portnum;
+	u8 portnum = adapter->portnum;
 
 	if (test_and_clear_bit(__QLCNIC_START_FW, &adapter->state))
 		return 1;
@@ -2045,8 +2045,8 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 		return -1;
 
 	val = QLCRD32(adapter, QLCNIC_CRB_DEV_REF_COUNT);
-	if (!(val & ((int)0x1 << (portnum * 4)))) {
-		val |= ((u32)0x1 << (portnum * 4));
+	if (!(val & (1 << (portnum * 4)))) {
+		QLC_DEV_SET_REF_CNT(val, portnum);
 		QLCWR32(adapter, QLCNIC_CRB_DEV_REF_COUNT, val);
 	}
 
@@ -2065,13 +2065,13 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 
 	case QLCNIC_DEV_NEED_RESET:
 		val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
-		val |= ((u32)0x1 << (portnum * 4));
+		QLC_DEV_SET_RST_RDY(val, portnum);
 		QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 		break;
 
 	case QLCNIC_DEV_NEED_QUISCENT:
 		val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
-		val |= ((u32)0x1 << ((portnum * 4) + 1));
+		QLC_DEV_SET_QSCNT_RDY(val, portnum);
 		QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 		break;
 
@@ -2101,7 +2101,7 @@ qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 		return -1;
 
 	val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
-	val &= ~((u32)0x3 << (portnum * 4));
+	QLC_DEV_CLR_RST_QSCNT(val, portnum);
 	QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 
 	qlcnic_api_unlock(adapter);
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 5/8] qlcnic: fix pci semaphore checks
From: Amit Kumar Salecha @ 2010-04-22 12:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1271940702-17064-1-git-send-email-amit.salecha@qlogic.com>

Driver should not go ahead with fw recovery if fails to acquire
semaphore.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_main.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 3c8a963..bfc5510 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1949,8 +1949,8 @@ static void qlcnic_poll_controller(struct net_device *netdev)
 }
 #endif
 
-static void
-qlcnic_set_drv_state(struct qlcnic_adapter *adapter, int state)
+static int
+qlcnic_set_drv_state(struct qlcnic_adapter *adapter, u8 state)
 {
 	u32  val;
 
@@ -1958,7 +1958,7 @@ qlcnic_set_drv_state(struct qlcnic_adapter *adapter, int state)
 			state != QLCNIC_DEV_NEED_QUISCENT);
 
 	if (qlcnic_api_lock(adapter))
-		return ;
+		return -EIO;
 
 	val = QLCRD32(adapter, QLCNIC_CRB_DRV_STATE);
 
@@ -1970,6 +1970,8 @@ qlcnic_set_drv_state(struct qlcnic_adapter *adapter, int state)
 	QLCWR32(adapter, QLCNIC_CRB_DRV_STATE, val);
 
 	qlcnic_api_unlock(adapter);
+
+	return 0;
 }
 
 static int
@@ -2195,7 +2197,8 @@ qlcnic_detach_work(struct work_struct *work)
 	if (adapter->temp == QLCNIC_TEMP_PANIC)
 		goto err_ret;
 
-	qlcnic_set_drv_state(adapter, adapter->dev_state);
+	if (qlcnic_set_drv_state(adapter, adapter->dev_state))
+		goto err_ret;
 
 	adapter->fw_wait_cnt = 0;
 
-- 
1.6.0.2


^ permalink raw reply related

* Re: [net-next,1/2] add iovnl netlink support
From: Arnd Bergmann @ 2010-04-22 12:49 UTC (permalink / raw)
  To: Scott Feldman; +Cc: Chris Wright, davem, netdev
In-Reply-To: <C7F4DE39.2A4C3%scofeldm@cisco.com>

On Thursday 22 April 2010, Scott Feldman wrote:
> On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > On Wednesday 21 April 2010, Scott Feldman wrote:
> >> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> >> You're right, not needed for enic since mac addr is included with
> >> port-profile push and vlan membership is implied by port-profile.  So I put
> >> set_mac_vlan in there basically to elicit feedback.
> > 
> > Ok. Two points though:
> > 
> > - when you say that the mac address is included in the port-profile push,
> >   does that imply that the VF does not have a mac address prior to this?
> 
> Correct, VF has no mac addr prior to port-profile being applied.  The
> mac_addr is the mac_addr of the VM guest interface that's to use the VF.  If
> the port-profile defines L2 mac spoofing, for example, the switch wants to
> know the mac address before i/o starts.   I/o doesn't start until
> port-profile is applied and the switch virtual port is setup.

Is it possible to split this this process, in order to make it more
closely resemble what we have when the registration is in user space?
This would mean that you assign a MAC address to the interface when the
interface gets created, and register the same MAC address at the switch
independent from the creation.

Obviously, if the port-profile (for enic) or the VSI list in the switch
enforces a the mac address and you pass one that's different from the
one that's set in the VF, it won't be able to send any data, but it
remains the job of the switch to enforce that case.

> It's not just a VLAN ID, but the entire VLAN membership for the switch
> virtual port.  The port-profile may define a single native VLAN for access
> mode on the switch port, or a trunk mode with a list of allowed vlans, with
> on native vlan.
> 
> The key is the port-profile.  The port-profile resolves the configuration of
> the switch virtual port.  The configuration of the switch virtual port
> includes many setting like I mentioned earlier: VLAN membership, QoS (rate
> limits, priority class, L2 security, etc).

Ok, I see.

> > If we integrate the iovnl client into iproute2, the sequence for setting
> > up an enic VF and associating it to the port profile could be
> > 
> > # create vf0, pass mac and vlan id to HW, no association yet
> > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> > 
> > # associate vf with port profile, mac address must match the one assigned
> > #  to the interface before.
> > ip iov assoc eth0 port-profile "general" host-uuid
> > "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> > mac fe:dc:ba:12:34:56
> 
> Ya, that sounds pretty close.  I still want the flexibility to direct ops to
> a PF link for a VF link.

Does that mean you require passing both the PF and the VF name?

	Arnd

^ permalink raw reply

* Re: [patch] bluetooth: handle l2cap_create_connless_pdu() errors
From: Gustavo F. Padovan @ 2010-04-22 12:33 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Marcel Holtmann, David S. Miller, Andrei Emeltchenko,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100422095201.GO29647@bicker>

Hi Dan,

* Dan Carpenter <error27-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [2010-04-22 11:52:01 +0200]:

> l2cap_create_connless_pdu() can sometimes return ERR_PTR(-ENOMEM) or
> ERR_PTR(-EFAULT).
> 
> Signed-off-by: Dan Carpenter <error27-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
> index 99d68c3..9753b69 100644
> --- a/net/bluetooth/l2cap.c
> +++ b/net/bluetooth/l2cap.c
> @@ -1626,7 +1626,10 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct ms
>  	/* Connectionless channel */
>  	if (sk->sk_type == SOCK_DGRAM) {
>  		skb = l2cap_create_connless_pdu(sk, msg, len);
> -		err = l2cap_do_send(sk, skb);
> +		if (IS_ERR(skb))
> +			err = PTR_ERR(skb);
> +		else
> +			err = l2cap_do_send(sk, skb);
>  		goto done;
>  	}

Your fix looks ok, but we have changed l2cap_do_send() to void, so you
have to modify your patch.

That change is not in the bluetooth-testingtree yet, so remote add my git
tree and use the branch ertm.

git://git.profusion.mobi/users/padovan/bluetooth-testing.git

Regards,

-- 
Gustavo F. Padovan
http://padovan.org

^ permalink raw reply

* Re: [PATCH] Socket filter ancilliary data access for skb->dev->type
From: Patrick McHardy @ 2010-04-22 12:28 UTC (permalink / raw)
  To: Paul LeoNerd Evans; +Cc: netdev
In-Reply-To: <20100422121253.GR19334@cel.leo>

Paul LeoNerd Evans wrote:
> Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving
> access to skb->dev->type, as reported in the sll_hatype field.
> 
> When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
> interfaces, there doesn't appear to be a way for the filter program to
> actually find out the underlying hardware type the packet was captured
> on. This patch adds such ability.
> 
> +		case SKF_AD_HATYPE:
> +			A = skb->dev->type;
> +			continue;

I think we should be adding a check whether skb->dev is non-NULL here
since filters can also be attached to netlink sockets. The same applies
to SKF_AD_IFINDEX.

^ permalink raw reply

* Re: [PATCH v5] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22 12:27 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: hadi, therbert, netdev
In-Reply-To: <20100422.024336.119875321.davem@davemloft.net>

On Thu, Apr 22, 2010 at 5:43 PM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Thu, 22 Apr 2010 17:09:17 +0800
>
>> +     unsigned int            input_pkt_queue_len;
>> +     struct sk_buff          *input_pkt_queue_head;
>> +     struct sk_buff          **input_pkt_queue_tailp;
>> +
>
> Please do not ignore Stephen Hemminger's feedback.
>
> We already have enough odd SKB queue implementations, we
> do not need yet another one in a core location.  This makes
> it harder and harder to eventually convert sk_buff to use
> "struct list_head".
>
> Instead, use "struct sk_buff_head" and the lockless accessors
> (__skb_insert, etc.) and initializer (__skb_queue_head_init).
>

If I want to keep softnet_data small, I have to access the internal
fields of sk_buff_head, and modify them in a hack way. It doesn't
sound good. If not, softnet_data will become:

struct softnet_data {
        struct Qdisc            *output_queue;
        struct list_head        poll_list;
        struct sk_buff          *completion_queue;
        struct sk_buff_head     process_queue;

#ifdef CONFIG_RPS
        struct softnet_data     *rps_ipi_list;

        /* Elements below can be accessed between CPUs for RPS */
        struct call_single_data csd ____cacheline_aligned_in_smp;
        struct softnet_data     *rps_ipi_next;
        unsigned int            cpu;
        unsigned int            input_queue_head;
#endif
        unsigned int            input_pkt_queue_len;
        struct sk_buff_head     input_pkt_queue;
        struct napi_struct      backlog;
};

Eric, do you think it is too fat?

^ permalink raw reply

* Re: [PATCH v5] net: batch skb dequeueing from softnet input_pkt_queue
From: jamal @ 2010-04-22 12:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271936227.7895.5285.camel@edumazet-laptop>

On Thu, 2010-04-22 at 13:37 +0200, Eric Dumazet wrote:

> Please reorder things better.
> 
> Most likely this function is called for one packet.
> 
> In your version you take twice the rps_lock()/rps_unlock() path, so
> it'll be slower.
> 
> Once to 'transfert' one list to process list
> 
> Once to be able to do the 'label out:' post processing.
> 

Ok, once it is settled the right changes are i will test.

cheers,
jamal


^ permalink raw reply

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: jamal @ 2010-04-22 12:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, David S. Miller, netdev, Tom Herbert
In-Reply-To: <1271891149.7895.3751.camel@edumazet-laptop>

On Thu, 2010-04-22 at 01:05 +0200, Eric Dumazet wrote:


> [RFC] net: introduce a batch mode in process_backlog()
> 
> We see a lock contention on input_pkt_queue.lock in RPS benches.
> 
> As suggested by Changli Gao, we can batch several skbs at once in
> process_backlog(), so that we dirty input_pkt_queue less often.
> 

Ok, so i grab the latest and greatest net-next and apply this before
testing? Let me know..

cheers,
jamal


^ permalink raw reply

* [PATCH] Socket filter ancilliary data access for skb->dev->type
From: Paul LeoNerd Evans @ 2010-04-22 12:12 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving
access to skb->dev->type, as reported in the sll_hatype field.

When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
interfaces, there doesn't appear to be a way for the filter program to
actually find out the underlying hardware type the packet was captured
on. This patch adds such ability.

Signed-off-by: Paul Evans <leonerd@leonerd.org.uk>

---

diff -ur linux-2.6.33.2.orig/include/linux/filter.h linux-2.6.33.2/include/linux/filter.h
--- linux-2.6.33.2.orig/include/linux/filter.h	2010-04-02 00:02:33.000000000 +0100
+++ linux-2.6.33.2/include/linux/filter.h	2010-04-20 22:40:25.000000000 +0100
@@ -123,7 +123,8 @@
 #define SKF_AD_NLATTR_NEST	16
 #define SKF_AD_MARK 	20
 #define SKF_AD_QUEUE	24
-#define SKF_AD_MAX	28
+#define SKF_AD_HATYPE	28
+#define SKF_AD_MAX	32
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff -ur linux-2.6.33.2.orig/net/core/filter.c linux-2.6.33.2/net/core/filter.c
--- linux-2.6.33.2.orig/net/core/filter.c	2010-04-02 00:02:33.000000000 +0100
+++ linux-2.6.33.2/net/core/filter.c	2010-04-20 22:41:01.000000000 +0100
@@ -309,6 +309,9 @@
 		case SKF_AD_QUEUE:
 			A = skb->queue_mapping;
 			continue;
+		case SKF_AD_HATYPE:
+			A = skb->dev->type;
+			continue;
 		case SKF_AD_NLATTR: {
 			struct nlattr *nla;

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-22 12:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <1271876480.7895.3106.camel@edumazet-laptop>

On Wed, 2010-04-21 at 21:01 +0200, Eric Dumazet wrote:

> Drawback of using a fixed src ip from your generator is that all flows
> share the same struct dst entry on SUT. This might explain some glitches
> you noticed (ip_route_input + ip_rcv at high level on slave/application
> cpus)

yes, that would explain it ;-> I could have flows going to each cpu
generating different unique dst. It is good i didnt ;->

> Also note your test is one way. If some data was replied we would see
> much use of the 'flows'
> 

In my next step i wanted to "route" these packets at app level and for
this stage of testing just wanted to sink the data to reduce experiment
variables. Reason:
The netdev structure would hit a lot of cache misses if i started using
it to both send/recv since lots of things are shared on tx/rx (example
napi tx prunning could happen on either tx or receive path); same thing
with qdisc path which is at netdev granularity.. I think there may be
room for interesting improvements in this area..

> I notice epoll_ctl() used a lot, are you re-arming epoll each time you
> receive a datagram ?

I am using default libevent on debian. It looks very old and maybe
buggy. I will try to upgrade first and if still see the same
investigate.
  
> I see slave/application cpus hit _raw_spin_lock_irqsave() and  
> _raw_spin_unlock_irqrestore().
> 
> Maybe a ring buffer could help (instead of a double linked queue) for
> backlog, or the double queue trick, if Changli wants to respin his
> patch.
> 

Ok, I will have some cycles later today/tommorow or for sure on weekend.
My setup is still intact - so i can test.

cheers,
jamal


^ permalink raw reply

* Re: [PATCH linux-next 1/2] irq: Add CPU mask affinity hint callback framework
From: Peter P Waskiewicz Jr @ 2010-04-22 12:11 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: tglx@linutronix.de, davem@davemloft.net, arjan@linux.jf.intel.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1271854785.2101.17.camel@achroite.uk.solarflarecom.com>

On Wed, 21 Apr 2010, Ben Hutchings wrote:

> On Tue, 2010-04-20 at 11:01 -0700, Peter P Waskiewicz Jr wrote:
>> This patch adds a callback function pointer to the irq_desc
>> structure, along with a registration function and a read-only
>> proc entry for each interrupt.
>>
>> This affinity_hint handle for each interrupt can be used by
>> underlying drivers that need a better mechanism to control
>> interrupt affinity.  The underlying driver can register a
>> callback for the interrupt, which will allow the driver to
>> provide the CPU mask for the interrupt to anything that
>> requests it.  The intent is to extend the userspace daemon,
>> irqbalance, to help hint to it a preferred CPU mask to balance
>> the interrupt into.
>
> Doesn't it make more sense to have the driver follow affinity decisions
> made from user-space?  I realise that reallocating queues is disruptive
> and we probably don't want irqbalance to trigger that, but there should
> be a mechanism for the administrator to trigger it.

The driver here would be assisting userspace (irqbalance) to provide 
better details how the HW is laid out with respect to flows.  As it stands 
today, irqbalance is almost guaranteed to move interrups to CPUs that are 
not aligned with where applications are running for network adapters. 
This is very apparent when running at speeds in the 10 Gigabit range, or 
even multiple 1 Gigabit ports running at the same time.

>
> Looking at your patch for ixgbe:
>
> [...]
>> diff --git a/drivers/net/ixgbe/ixgbe_main.c
>> b/drivers/net/ixgbe/ixgbe_main.c
>> index 1b1419c..3e00d41 100644
>> --- a/drivers/net/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ixgbe/ixgbe_main.c
> [...]
>> @@ -1083,6 +1113,16 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
>>                         q_vector->eitr = adapter->rx_eitr_param;
>>
>>                 ixgbe_write_eitr(q_vector);
>> +
>> +               /*
>> +                * Allocate the affinity_hint cpumask, assign the mask for
>> +                * this vector, and register our affinity_hint callback.
>> +                */
>> +               alloc_cpumask_var(&q_vector->affinity_mask, GFP_KERNEL);
>> +               cpumask_set_cpu(v_idx, q_vector->affinity_mask);
>> +               irq_register_affinity_hint(adapter->msix_entries[v_idx].vector,
>> +                                          adapter,
>> +                                          &ixgbe_irq_affinity_callback);
>>         }
>>
>>         if (adapter->hw.mac.type == ixgbe_mac_82598EB)
> [...]
>
> This just assigns IRQs to the first n CPU threads.  Depending on the
> enumeration order, this might result in assigning an IRQ to each of 2
> threads on a core while leaving other cores unused!

This ixgbe patch is only meant to be an example of how you could use it. 
I didn't hammer out all the corner cases of interrupt alignment in it yet. 
However, ixgbe is already aligning Tx flows onto the CPU/queue pair the Tx 
occurred (i.e. Tx session from CPU 4 will be queued on Tx queue 4), and 
then uses our Flow Director HW offload to steer Rx to Rx queue 4, assuming 
that the interrupt for Rx queue 4 is affinitized to CPU 4.  The flow 
alignment breaks when the IRQ affinity has no knowledge what the 
underlying set of vectors are bound to, and what mode the HW is running 
in.

FCoE offloads that spread multiple SCSI exchange IDs across CPU cores also 
needs this to properly align things.  John Fastabend is going to provide 
some examples where this is very useful in the FCoE case.

> irqbalance can already find the various IRQs associated with a single
> net device by looking at the handler names.  So it can do at least as
> well as this without such a hint.  Unless drivers have *useful* hints to
> give, I don't see the point in adding this mechanism.

irqbalance identifies which interrupts go with which network device.  But 
it has no clue about flow management, and often will make a decision that 
hurts performance scaling.  I have data showing when scaling multiple 10 
Gigabit ports (4 in the current test), I can gain an extra 10 Gigabits of 
throughput just by aligning the interrupts properly (go from ~58 Gbps to 
~68 Gbps in bi-directional tests).

I do have the patches for irqbalance that uses this new handle to make 
better decisions for devices implementing the mask.  I can send those to 
help show the whole picture of what's happening.

Appreciate the feedback though Ben.

Cheers,
-PJ

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox