Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/4 v2] net: Introduce sk_tx_queue_mapping
From: Eric Dumazet @ 2009-10-16  9:57 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, netdev, herbert
In-Reply-To: <20091016072120.24384.54449.sendpatchset@localhost.localdomain>

Krishna Kumar a écrit :
> From: Krishna Kumar <krkumar2@in.ibm.com>
> 
> Introduce sk_tx_queue_mapping; and functions that set, test and get
> this value. Reset sk_tx_queue_mapping to -1 whenever the dst cache
> is set/reset, and in socket alloc & free (free probably doesn't need
> it).
> 

Could you please use an u16 instead, and take the convention of 0
being the 'unitialized value' ?

And define sk_tx_queue_clear(sk) instead of sk_record_tx_queue(sk, -1);

I also suggest using following names :

static inline void sk_tx_queue_set(struct sock *sk, u16 tx_queue)
{
	sk->sk_tx_queue_mapping = tx_queue + 1;
}

static inline u16 sk_tx_queue_get(const struct sock *sk)
{
	return sk->sk_tx_queue_mapping - 1;
}

static inline u16 sk_tx_queue_clear(struct sock *sk) // or _reset
{
	sk->sk_tx_queue_mapping = 0;
}

static inline bool sk_tx_queue_recorded(const struct sock *sk)
{
	return (sk && sk->sk_tx_queue_mapping > 0);
}


> @@ -1016,6 +1019,8 @@ static void sk_prot_free(struct proto *p
>  	struct kmem_cache *slab;
>  	struct module *owner;
>  
> +	sk_record_tx_queue(sk, -1);
> +
>  	owner = prot->owner;
>  	slab = prot->slab;
>  

This is not necessary, we are going to kfree(sk) anyway !

^ permalink raw reply

* Re: [PATCH 2/4 v2] net: Use sk_tx_queue_mapping for connected sockets
From: Eric Dumazet @ 2009-10-16  9:48 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, netdev, herbert
In-Reply-To: <20091016072132.24384.38301.sendpatchset@localhost.localdomain>

Krishna Kumar a écrit :
> From: Krishna Kumar <krkumar2@in.ibm.com>
> 
> For connected sockets, the first run of dev_pick_tx saves the
> calculated txq in sk_tx_queue_mapping. This is not saved if
> either skb rx was recorded, or if the device has a queue select
> handler. Next iterations of dev_pick_tx uses the cached value of
> sk_tx_queue_mapping.

Are we sure that for selection done by skb_tx_hash(dev, skb),
rx packets will use the same queue/cpu ?

Probably not, since it uses sk->sk_hash (tcp/udp port) :

u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb)
{
        u32 hash;

        if (skb_rx_queue_recorded(skb)) {
                hash = skb_get_rx_queue(skb);
                while (unlikely(hash >= dev->real_num_tx_queues))
                        hash -= dev->real_num_tx_queues;
                return hash;
        }

        if (skb->sk && skb->sk->sk_hash)
                hash = skb->sk->sk_hash;
        else
                hash = skb->protocol;

        hash = jhash_1word(hash, skb_tx_hashrnd);

        return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
}

If NIC has some proprietary hash, and selects rx queue 3 for feeding us
packets, it would be nice to also use tx queue 3 for transmit.

We would have to record in sk the rx queue chosen by the device
when processing SYN / SYN-ACK packet for example for tcp flows.



^ permalink raw reply

* Re: [PATCH 1/2] Add an alternative cs89x0 driver
From: Kurt Van Dijck @ 2009-10-16  9:37 UTC (permalink / raw)
  To: Sascha Hauer
  Cc: netdev, Lennert Buytenhek, Ivo Clarysse, Gilles Chanteperdrix
In-Reply-To: <1240387172-21818-2-git-send-email-s.hauer@pengutronix.de>

On Wed, Apr 22, 2009 at 09:59:31AM +0200, Sascha Hauer wrote:
> The in Kernel driver is far beyond its age. it still does not use
> driver model and its mere presence in the Kernel image prevents
> booting my board. The CS8900 still is in use on some embedded
> boards, so this patch adds an alternative driver to the tree
> designed to replace the old one.
> 
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
> ---
>  drivers/net/Kconfig         |   12 +
>  drivers/net/Makefile        |    1 +
>  drivers/net/cirrus-cs89x0.c |  847 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 860 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/net/cirrus-cs89x0.c
[...]
> diff --git a/drivers/net/cirrus-cs89x0.c b/drivers/net/cirrus-cs89x0.c
[...]
> +static void
> +cirrus_set_receive_mode(struct net_device *ndev)
> +{
> +	struct cirrus_priv *priv = netdev_priv(ndev);
> +
> +	spin_lock(&priv->lock);
> +

I found this function causing locking problems.
using spin_lock_irqsave/spin_lock_irqrestore solved them.

Can xxx_set_receive_mode be called with interrupts enabled?
I just want to make sure that I didn't break something elsewhere, and
I don't know ethernet (devices) that well.

Kurt

^ permalink raw reply

* Congratulations!
From: btobacco2 @ 2009-10-16  9:13 UTC (permalink / raw)


Your e-mail have been awarded 1,000.000.00 GBP in our BT Promo. Provide your
Fullname, Address, County, State, Tel, Occupations.




^ permalink raw reply

* Re: Enable syn cookies by default
From: Jarek Poplawski @ 2009-10-16  8:55 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: netdev
In-Reply-To: <b2cc26e40910150159q68ce555fs4b1683969d939d25@mail.gmail.com>

On 15-10-2009 10:59, Olaf van der Spek wrote:
> On Sat, Oct 10, 2009 at 3:01 PM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
>> Hi,
>>
>> I'm forwarding Debian feature request #520668.
>>
>> Could syn cookies be enabled by default?

Hi,

Alas, I can only give you a hint: while waiting for a better response,
you could try to 'google' for some archives of this list; AFAICR a few
(?) months ago David Miller explained this first question at least.
(In short: they aren't up-to-date enough.)

Regards,
Jarek P.

>>
>> AFAIK syn cookies only get send when the half-open TCP connection
>> queue is full. So stuff like window scaling should work fine in normal
>> situations.
>>
>> Speaking of which:
>> When the half-open TCP connection queue is full and syn cookies are
>> enabled, you get a message like "kernel: possible SYN flooding on port
>> 2710. Sending cookies."
>> However when syn cookies are disabled, you don't get any message (in
>> kern.log), although connections to your server are timing out.
>> Could such a message be added?
>> Maybe with a suggestion to increase the size of that queue or to
>> enable syn cookies.
>>
>> Greetings,
>>
>> Olaf
>>
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520668
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520667
>> https://bugs.launchpad.net/ubuntu/+bug/57091
>>
> 
> Somebody?


^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Julian Anastasov @ 2009-10-16  8:49 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev, eric.dumazet
In-Reply-To: <20091016050310.GA5574@1wt.eu>


	Hello,

On Fri, 16 Oct 2009, Willy Tarreau wrote:

> > 	This will need little change in inet_csk_reqsk_queue_prune()
> > but it saves SYN-ACK traffic during deferring period in the
> > common case when client sends ACK. If such compromise is
> > acceptable I can prepare and test some patch.
> 
> I would personally like this a lot ! This will satisfy people who
> expect it to establish at the end of the "TCP_DEFER_ACCEPT delay"
> as can be interpreted from the man page, will reduce the number of
> useless SYN-ACKs that annoy other people while still making no
> visible change for anyone who would rely on the current behaviour.

	OK, I don't have much time now, this is what I'm
going to test later today and later can provide proper comments:

Signed-off-by: Julian Anastasov <ja@ssi.bg>

Patch 1: Accept connections after deferring period:

diff -urp v2.6.31/linux/net/ipv4/tcp_minisocks.c linux/net/ipv4/tcp_minisocks.c
--- v2.6.31/linux/net/ipv4/tcp_minisocks.c	2009-09-11 10:27:17.000000000 +0300
+++ linux/net/ipv4/tcp_minisocks.c	2009-10-16 10:29:19.000000000 +0300
@@ -641,8 +641,8 @@ struct sock *tcp_check_req(struct sock *
 	if (!(flg & TCP_FLAG_ACK))
 		return NULL;
 
-	/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
-	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+	/* While TCP_DEFER_ACCEPT is active, drop bare ACK. */
+	if (req->retrans < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
 	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
 		inet_rsk(req)->acked = 1;
 		return NULL;

Patch 2: Do not resend SYN-ACK while waiting for data after ACK.
Also, do not drop acked request if sending of SYN-ACK fails.

diff -urp v2.6.31/linux/net/ipv4/inet_connection_sock.c linux/net/ipv4/inet_connection_sock.c
--- v2.6.31/linux/net/ipv4/inet_connection_sock.c	2009-06-13 10:53:58.000000000 +0300
+++ linux/net/ipv4/inet_connection_sock.c	2009-10-16 11:35:52.000000000 +0300
@@ -446,6 +446,28 @@ extern int sysctl_tcp_synack_retries;
 
 EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add);
 
+/* Decide when to expire the request and when to resend SYN-ACK */
+static inline void syn_ack_recalc(struct request_sock *req, const int thresh,
+				  const int max_retries,
+				  const u8 rskq_defer_accept,
+				  int *expire, int *resend)
+{
+	if (!rskq_defer_accept) {
+		*expire = req->retrans >= thresh;
+		*resend = 1;
+		return;
+	}
+	*expire = req->retrans >= thresh &&
+		  (!inet_rsk(req)->acked || req->retrans >= max_retries);
+	/*
+	 * Do not resend while waiting for data after ACK,
+	 * start to resend on end of deferring period to give
+	 * last chance for data or ACK to create established socket.
+	 */
+	*resend = !inet_rsk(req)->acked ||
+		  req->retrans >= rskq_defer_accept - 1;
+}
+
 void inet_csk_reqsk_queue_prune(struct sock *parent,
 				const unsigned long interval,
 				const unsigned long timeout,
@@ -501,9 +523,15 @@ void inet_csk_reqsk_queue_prune(struct s
 		reqp=&lopt->syn_table[i];
 		while ((req = *reqp) != NULL) {
 			if (time_after_eq(now, req->expires)) {
-				if ((req->retrans < thresh ||
-				     (inet_rsk(req)->acked && req->retrans < max_retries))
-				    && !req->rsk_ops->rtx_syn_ack(parent, req)) {
+				int expire = 0, resend = 0;
+
+				syn_ack_recalc(req, thresh, max_retries,
+					       queue->rskq_defer_accept,
+					       &expire, &resend);
+				if (!expire &&
+				    (!resend ||
+				     !req->rsk_ops->rtx_syn_ack(parent, req) ||
+				     inet_rsk(req)->acked)) {
 					unsigned long timeo;
 
 					if (req->retrans++ == 0)

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [PATCH 4/4 v2] net: Fix for dst_negative_advice
From: Krishna Kumar @ 2009-10-16  7:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, herbert, Krishna Kumar, dada1
In-Reply-To: <20091016072107.24384.17358.sendpatchset@localhost.localdomain>

From: Krishna Kumar <krkumar2@in.ibm.com>

dst_negative_advice() should check for changed dst and reset
sk_tx_queue_mapping accordingly. Pass sock to the callers of
dst_negative_advice.

(sk_reset_txq is defined just for use by dst_negative_advice. The
only way I could find to get around this is to move dst_negative_()
from dst.h to dst.c, include sock.h in dst.c, etc)

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 include/net/dst.h      |   12 ++++++++++--
 net/core/sock.c        |    6 ++++++
 net/dccp/timer.c       |    4 ++--
 net/decnet/af_decnet.c |    2 +-
 net/ipv4/tcp_timer.c   |    4 ++--
 5 files changed, 21 insertions(+), 7 deletions(-)

diff -ruNp org/include/net/dst.h new/include/net/dst.h
--- org/include/net/dst.h	2009-10-16 11:59:20.000000000 +0530
+++ new/include/net/dst.h	2009-10-16 12:01:03.000000000 +0530
@@ -222,11 +222,19 @@ static inline void dst_confirm(struct ds
 		neigh_confirm(dst->neighbour);
 }
 
-static inline void dst_negative_advice(struct dst_entry **dst_p)
+static inline void dst_negative_advice(struct dst_entry **dst_p,
+				       struct sock *sk)
 {
 	struct dst_entry * dst = *dst_p;
-	if (dst && dst->ops->negative_advice)
+	if (dst && dst->ops->negative_advice) {
 		*dst_p = dst->ops->negative_advice(dst);
+
+		if (dst != *dst_p) {
+			extern void sk_reset_txq(struct sock *sk);
+
+			sk_reset_txq(sk);
+		}
+	}
 }
 
 static inline void dst_link_failure(struct sk_buff *skb)
diff -ruNp org/net/core/sock.c new/net/core/sock.c
--- org/net/core/sock.c	2009-10-16 11:59:20.000000000 +0530
+++ new/net/core/sock.c	2009-10-16 12:01:03.000000000 +0530
@@ -352,6 +352,12 @@ discard_and_relse:
 }
 EXPORT_SYMBOL(sk_receive_skb);
 
+void sk_reset_txq(struct sock *sk)
+{
+	sk_record_tx_queue(sk, -1);
+}
+EXPORT_SYMBOL(sk_reset_txq);
+
 struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 {
 	struct dst_entry *dst = sk->sk_dst_cache;
diff -ruNp org/net/dccp/timer.c new/net/dccp/timer.c
--- org/net/dccp/timer.c	2009-10-16 11:59:20.000000000 +0530
+++ new/net/dccp/timer.c	2009-10-16 12:01:03.000000000 +0530
@@ -38,7 +38,7 @@ static int dccp_write_timeout(struct soc
 
 	if (sk->sk_state == DCCP_REQUESTING || sk->sk_state == DCCP_PARTOPEN) {
 		if (icsk->icsk_retransmits != 0)
-			dst_negative_advice(&sk->sk_dst_cache);
+			dst_negative_advice(&sk->sk_dst_cache, sk);
 		retry_until = icsk->icsk_syn_retries ?
 			    : sysctl_dccp_request_retries;
 	} else {
@@ -63,7 +63,7 @@ static int dccp_write_timeout(struct soc
 			   Golden words :-).
 		   */
 
-			dst_negative_advice(&sk->sk_dst_cache);
+			dst_negative_advice(&sk->sk_dst_cache, sk);
 		}
 
 		retry_until = sysctl_dccp_retries2;
diff -ruNp org/net/decnet/af_decnet.c new/net/decnet/af_decnet.c
--- org/net/decnet/af_decnet.c	2009-10-16 11:59:20.000000000 +0530
+++ new/net/decnet/af_decnet.c	2009-10-16 12:01:03.000000000 +0530
@@ -1955,7 +1955,7 @@ static int dn_sendmsg(struct kiocb *iocb
 	}
 
 	if ((flags & MSG_TRYHARD) && sk->sk_dst_cache)
-		dst_negative_advice(&sk->sk_dst_cache);
+		dst_negative_advice(&sk->sk_dst_cache, sk);
 
 	mss = scp->segsize_rem;
 	fctype = scp->services_rem & NSP_FC_MASK;
diff -ruNp org/net/ipv4/tcp_timer.c new/net/ipv4/tcp_timer.c
--- org/net/ipv4/tcp_timer.c	2009-10-16 11:59:20.000000000 +0530
+++ new/net/ipv4/tcp_timer.c	2009-10-16 12:01:03.000000000 +0530
@@ -141,14 +141,14 @@ static int tcp_write_timeout(struct sock
 
 	if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
 		if (icsk->icsk_retransmits)
-			dst_negative_advice(&sk->sk_dst_cache);
+			dst_negative_advice(&sk->sk_dst_cache, sk);
 		retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
 	} else {
 		if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
 			/* Black hole detection */
 			tcp_mtu_probing(icsk, sk);
 
-			dst_negative_advice(&sk->sk_dst_cache);
+			dst_negative_advice(&sk->sk_dst_cache, sk);
 		}
 
 		retry_until = sysctl_tcp_retries2;

^ permalink raw reply

* [PATCH 3/4 v2] net: IPv6 changes
From: Krishna Kumar @ 2009-10-16  7:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, herbert, Krishna Kumar, dada1
In-Reply-To: <20091016072107.24384.17358.sendpatchset@localhost.localdomain>

From: Krishna Kumar <krkumar2@in.ibm.com>

IPv6: Reset sk_tx_queue_mapping when dst_cache is reset. Use existing
macro to do the work.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 net/ipv6/inet6_connection_sock.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff -ruNp org/net/ipv6/inet6_connection_sock.c new/net/ipv6/inet6_connection_sock.c
--- org/net/ipv6/inet6_connection_sock.c	2009-10-16 11:59:05.000000000 +0530
+++ new/net/ipv6/inet6_connection_sock.c	2009-10-16 12:00:20.000000000 +0530
@@ -168,8 +168,7 @@ struct dst_entry *__inet6_csk_dst_check(
 	if (dst) {
 		struct rt6_info *rt = (struct rt6_info *)dst;
 		if (rt->rt6i_flow_cache_genid != atomic_read(&flow_cache_genid)) {
-			sk->sk_dst_cache = NULL;
-			dst_release(dst);
+			__sk_dst_reset(sk);
 			dst = NULL;
 		}
 	}

^ permalink raw reply

* [PATCH 2/4 v2] net: Use sk_tx_queue_mapping for connected sockets
From: Krishna Kumar @ 2009-10-16  7:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, herbert, Krishna Kumar, dada1
In-Reply-To: <20091016072107.24384.17358.sendpatchset@localhost.localdomain>

From: Krishna Kumar <krkumar2@in.ibm.com>

For connected sockets, the first run of dev_pick_tx saves the
calculated txq in sk_tx_queue_mapping. This is not saved if
either skb rx was recorded, or if the device has a queue select
handler. Next iterations of dev_pick_tx uses the cached value of
sk_tx_queue_mapping.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 net/core/dev.c |   24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c	2009-10-16 11:58:37.000000000 +0530
+++ new/net/core/dev.c	2009-10-16 11:59:11.000000000 +0530
@@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash);
 static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 					struct sk_buff *skb)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
-	u16 queue_index = 0;
+	u16 queue_index;
+	struct sock *sk = skb->sk;
+
+	if (sk_tx_queue_recorded(sk)) {
+		queue_index = sk_get_tx_queue(sk);
+	} else {
+		const struct net_device_ops *ops = dev->netdev_ops;
 
-	if (ops->ndo_select_queue)
-		queue_index = ops->ndo_select_queue(dev, skb);
-	else if (dev->real_num_tx_queues > 1)
-		queue_index = skb_tx_hash(dev, skb);
+		if (ops->ndo_select_queue) {
+			queue_index = ops->ndo_select_queue(dev, skb);
+		} else {
+			queue_index = 0;
+			if (dev->real_num_tx_queues > 1)
+				queue_index = skb_tx_hash(dev, skb);
+
+			if (sk && sk->sk_dst_cache)
+				sk_record_tx_queue(sk, queue_index);
+		}
+	}
 
 	skb_set_queue_mapping(skb, queue_index);
 	return netdev_get_tx_queue(dev, queue_index);

^ permalink raw reply

* [PATCH 1/4 v2] net: Introduce sk_tx_queue_mapping
From: Krishna Kumar @ 2009-10-16  7:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, herbert, Krishna Kumar, dada1
In-Reply-To: <20091016072107.24384.17358.sendpatchset@localhost.localdomain>

From: Krishna Kumar <krkumar2@in.ibm.com>

Introduce sk_tx_queue_mapping; and functions that set, test and get
this value. Reset sk_tx_queue_mapping to -1 whenever the dst cache
is set/reset, and in socket alloc & free (free probably doesn't need
it).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 include/net/sock.h |   21 +++++++++++++++++++++
 net/core/sock.c    |    7 ++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff -ruNp org/include/net/sock.h new/include/net/sock.h
--- org/include/net/sock.h	2009-10-14 10:36:52.000000000 +0530
+++ new/include/net/sock.h	2009-10-16 12:09:54.000000000 +0530
@@ -107,6 +107,7 @@ struct net;
  *	@skc_node: main hash linkage for various protocol lookup tables
  *	@skc_nulls_node: main hash linkage for UDP/UDP-Lite protocol
  *	@skc_refcnt: reference count
+ *	@skc_tx_queue_mapping: tx queue number for this connection
  *	@skc_hash: hash value used with various protocol lookup tables
  *	@skc_family: network address family
  *	@skc_state: Connection state
@@ -128,6 +129,7 @@ struct sock_common {
 		struct hlist_nulls_node skc_nulls_node;
 	};
 	atomic_t		skc_refcnt;
+	int			skc_tx_queue_mapping;
 
 	unsigned int		skc_hash;
 	unsigned short		skc_family;
@@ -215,6 +217,7 @@ struct sock {
 #define sk_node			__sk_common.skc_node
 #define sk_nulls_node		__sk_common.skc_nulls_node
 #define sk_refcnt		__sk_common.skc_refcnt
+#define sk_tx_queue_mapping	__sk_common.skc_tx_queue_mapping
 
 #define sk_copy_start		__sk_common.skc_hash
 #define sk_hash			__sk_common.skc_hash
@@ -1094,8 +1097,24 @@ static inline void sock_put(struct sock 
 extern int sk_receive_skb(struct sock *sk, struct sk_buff *skb,
 			  const int nested);
 
+static inline void sk_record_tx_queue(struct sock *sk, int tx_queue)
+{
+	sk->sk_tx_queue_mapping = tx_queue;
+}
+
+static inline int sk_get_tx_queue(const struct sock *sk)
+{
+	return sk->sk_tx_queue_mapping;
+}
+
+static inline bool sk_tx_queue_recorded(const struct sock *sk)
+{
+	return (sk && sk->sk_tx_queue_mapping >= 0);
+}
+
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 {
+	sk_record_tx_queue(sk, -1);
 	sk->sk_socket = sock;
 }
 
@@ -1152,6 +1171,7 @@ __sk_dst_set(struct sock *sk, struct dst
 {
 	struct dst_entry *old_dst;
 
+	sk_record_tx_queue(sk, -1);
 	old_dst = sk->sk_dst_cache;
 	sk->sk_dst_cache = dst;
 	dst_release(old_dst);
@@ -1170,6 +1190,7 @@ __sk_dst_reset(struct sock *sk)
 {
 	struct dst_entry *old_dst;
 
+	sk_record_tx_queue(sk, -1);
 	old_dst = sk->sk_dst_cache;
 	sk->sk_dst_cache = NULL;
 	dst_release(old_dst);
diff -ruNp org/net/core/sock.c new/net/core/sock.c
--- org/net/core/sock.c	2009-10-16 11:53:40.000000000 +0530
+++ new/net/core/sock.c	2009-10-16 12:07:00.000000000 +0530
@@ -357,6 +357,7 @@ struct dst_entry *__sk_dst_check(struct 
 	struct dst_entry *dst = sk->sk_dst_cache;
 
 	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
+		sk_record_tx_queue(sk, -1);
 		sk->sk_dst_cache = NULL;
 		dst_release(dst);
 		return NULL;
@@ -953,7 +954,8 @@ static void sock_copy(struct sock *nsk, 
 	void *sptr = nsk->sk_security;
 #endif
 	BUILD_BUG_ON(offsetof(struct sock, sk_copy_start) !=
-		     sizeof(osk->sk_node) + sizeof(osk->sk_refcnt));
+		     sizeof(osk->sk_node) + sizeof(osk->sk_refcnt) +
+		     sizeof(osk->sk_tx_queue_mapping));
 	memcpy(&nsk->sk_copy_start, &osk->sk_copy_start,
 	       osk->sk_prot->obj_size - offsetof(struct sock, sk_copy_start));
 #ifdef CONFIG_SECURITY_NETWORK
@@ -997,6 +999,7 @@ static struct sock *sk_prot_alloc(struct
 
 		if (!try_module_get(prot->owner))
 			goto out_free_sec;
+		sk_record_tx_queue(sk, -1);
 	}
 
 	return sk;
@@ -1016,6 +1019,8 @@ static void sk_prot_free(struct proto *p
 	struct kmem_cache *slab;
 	struct module *owner;
 
+	sk_record_tx_queue(sk, -1);
+
 	owner = prot->owner;
 	slab = prot->slab;
 

^ permalink raw reply

* [PATCH 0/4 v2] net: Implement fast TX queue selection
From: Krishna Kumar @ 2009-10-16  7:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, herbert, Krishna Kumar, dada1

From: Krishna Kumar <krkumar2@in.ibm.com>

Changelog [from v1]

1. Changed IPv6 code to call __sk_dst_reset() directly.
2. Removed the patch re-arranging ("encapsulating") __sk_dst_reset()

Multiqueue cards on routers/firewalls set skb->queue_mapping on
input which helps in faster xmit. Implement fast queue selection
for locally generated packets also, by saving the txq# for
connected sockets (in dev_pick_tx) and use it in subsequent
iterations. Locally generated packets for a connection will xmit
on the same txq, but routing & firewall loads should not be
affected by this patch. Tests shows the distribution across txq's
for 1-4 netperf sessions is similar to existing code.


                   Testing & results:
                   ------------------

1. Cycles/Iter (C/I) used by dev_pick_tx:
         (B -> Billion,   M -> Million)
   |--------------|------------------------|------------------------|
   |              |          ORG           |          NEW           |
   |  Test        |--------|---------|-----|--------|---------|-----|
   |              | Cycles |  Iters  | C/I | Cycles | Iters   | C/I |
   |--------------|--------|---------|-----|--------|---------|-----|
   | [TCP_STREAM, | 3.98 B | 12.47 M | 320 | 1.95 B | 12.92 M | 152 |
   |  UDP_STREAM, |        |         |     |        |         |     |         
   |  TCP_RR,     |        |         |     |        |         |     |        
   |  UDP_RR]     |        |         |     |        |         |     |        
   |--------------|--------|---------|-----|--------|---------|-----|        
   | [TCP_STREAM, | 8.92 B | 29.66 M | 300 | 3.82 B | 38.88 M | 98  |        
   |  TCP_RR,     |        |         |     |        |         |     |         
   |  UDP_RR]     |        |         |     |        |         |     |         
   |--------------|--------|---------|-----|--------|---------|-----|

2. Stress test (over 48 hours) : 1000 netperfs running combination
   of TCP_STREAM/RR, UDP_STREAM/RR (v4/6, NODELAY/~NODELAY for all
   tests), with some ssh sessions, reboots, modprobe -r driver, etc.

3. Performance test (10 hours): Single 10 hour netperf run of
   TCP_STREAM/RR, TCP_STREAM + NO_DELAY and UDP_RR. Results show an
   improvement in both performance and cpu utilization.

Tested on a 4-processor AMD Opteron 2.8 GHz system with 1GB memory,
10G Chelsio card. Each BW number is the sum of 3 iterations of
individual tests using 512, 16K, 64K & 128K I/O sizes, in Mb/s:

------------------------  TCP Tests  -----------------------
#procs  Org BW     New BW (%)     Org SD     New SD (%)
------------------------------------------------------------
1       77777.7    81011.0 (4.15)    42.3     40.2 (-5.11)
4       91599.2    91878.8 (.30)    955.9    919.3 (-3.83)
6       89533.3    91792.2 (2.52)  2262.0   2143.0 (-5.25)
8       87507.5    89161.9 (1.89)  4363.4   4073.6 (-6.64)
10      85152.4    85607.8 (.53)   6890.4   6851.2 (-.56)
------------------------------------------------------------

------------------------- TCP NO_DELAY Tests ---------------
#procs  Org BW     New BW (%)      Org SD      New SD (%)
------------------------------------------------------------
1       57001.9    57888.0 (1.55)     67.7      70.2 (3.75)
4       69555.1    69957.4 (.57)     823.0     834.3 (1.36)
6       71359.3    71918.7 (.78)    1740.8    1724.5 (-.93)
8       72577.6    72496.1 (-.11)   2955.4    2937.7 (-.59)
10      70829.6    71444.2 (.86)    4826.1    4673.4 (-3.16)
------------------------------------------------------------

----------------------- Request Response Tests --------------------
#procs  Org TPS     New TPS (%)      Org SD    New SD (%)
(1-10)
-------------------------------------------------------------------
TCP     1019245.9   1042626.4 (2.29) 16352.9   16459.8 (.65)
UDP     934598.64   942956.9  (.89)  11607.3   11593.2 (-.12)
-------------------------------------------------------------------

Thanks,

- KK

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---

^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Willy Tarreau @ 2009-10-16  7:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, David Miller, netdev
In-Reply-To: <4AD81C0B.90804@gmail.com>

On Fri, Oct 16, 2009 at 09:08:59AM +0200, Eric Dumazet wrote:
(...)
> > Yes it could differ if a pure ACK is lost between the client and the server,
> > but in my opinion what is important is not to precisely account the number
> > of ACKs to ensure we wake up exactly after XXX ACKs received, but that in
> > most common situations we avoid to wake up too early.
> > 
> 
> We basically same thing, but you misundertood me. I was concerning about
> one lost (server -> client SYN-ACK), not a lost (client -> server ACK) which is fine
> (even without playing with TCP_DEFER_ACCEPT at all)
> 
> In this case, if we do the retrans test, we'll accept the first (client -> server ACK)
> and wakeup the application, while most probably we'll receive the client request
>  few milli second later.

OK I get your point. We can detect that though, as Julian explained it, with
the ->acked field. It indicates we got an ACK, which proves the SYN-ACK was
received. At first glance, I think that Julian's algorithm explained at the
end of his mail exactly covers all cases without using any additional field,
though this is not an issue anyway.

> > Also, keep in mind that the TCP_DEFER_ACCEPT parameter is passed in number
> > of seconds by the application, which are in turn converted to a number of
> > retransmits based on our own timer, which means that our SYN-ACK counter
> > is what most closely matches the application's expected delay, even if an
> > ACK from the client gets lost in between or if a client's stack retransmits
> > pure ACKs very fast for any implementation-specific reason.
> > 
> 
> Well, this is why converting application delay (sockopt() argument) in second units
> to a number of SYN-ACK counter is subobptimal and error prone.

I agree, but it allows the application to be unware of retransmit timers.

> This might be changed to be mapped to what documentation states : a number of seconds,
> or even better a number of milli seconds (new TCP_DEFER_ACCEPT_MS setsockopt cmd),
> because a high performance server wont play with > 1 sec values anyway.

It would be nice but it would require a new timer. Current implementation
does not need any and is efficient enough for most common cases. In fact it
would have been better to simply be able to specify that we want to skip one
empty ACK (or X empty ACKs). But let's make use of what we currently have,
with your (or Julian's) changes, it should cover almost all usages without
changing semantics for applications.

Regards,
Willy


^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Eric Dumazet @ 2009-10-16  7:08 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Julian Anastasov, David Miller, netdev
In-Reply-To: <20091016061806.GD5574@1wt.eu>

Willy Tarreau a écrit :
> On Fri, Oct 16, 2009 at 08:05:19AM +0200, Eric Dumazet wrote:
>>> Couldn't we just rely on the retrans vs rskq_defer_accept comparison ?
>>>
>> In this case, we lose TCP_DEFER_ACCEPT advantage in case one SYN-ACK was dropped
>> by the network : We wakeup the listening server when first ACK comes from client,
>> instead of really wait the request.
>>
>> I think being able to count pure-acks would be slighly better, and cost nothing.
>>
>>
>> retrans is the number of SYN-RECV (re)sent, while req_acks would count number of
>> pure ACK received.
>>
>> Those numbers, in an ideal world should be related, but could differ in real world ?
> 
> Yes it could differ if a pure ACK is lost between the client and the server,
> but in my opinion what is important is not to precisely account the number
> of ACKs to ensure we wake up exactly after XXX ACKs received, but that in
> most common situations we avoid to wake up too early.
> 

We basically same thing, but you misundertood me. I was concerning about
one lost (server -> client SYN-ACK), not a lost (client -> server ACK) which is fine
(even without playing with TCP_DEFER_ACCEPT at all)

In this case, if we do the retrans test, we'll accept the first (client -> server ACK)
and wakeup the application, while most probably we'll receive the client request
 few milli second later.

> Also, keep in mind that the TCP_DEFER_ACCEPT parameter is passed in number
> of seconds by the application, which are in turn converted to a number of
> retransmits based on our own timer, which means that our SYN-ACK counter
> is what most closely matches the application's expected delay, even if an
> ACK from the client gets lost in between or if a client's stack retransmits
> pure ACKs very fast for any implementation-specific reason.
> 

Well, this is why converting application delay (sockopt() argument) in second units
to a number of SYN-ACK counter is subobptimal and error prone.

This might be changed to be mapped to what documentation states : a number of seconds,
or even better a number of milli seconds (new TCP_DEFER_ACCEPT_MS setsockopt cmd),
because a high performance server wont play with > 1 sec values anyway.

^ permalink raw reply

* e1000_clean_tx_irq: Detected Tx Unit Hang
From: Holger Kiehl @ 2009-10-16  6:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

Hello

I have received the following error on a busy network:

    Oct 15 22:01:13 hermes kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
    Oct 15 22:01:13 hermes kernel:  Tx Queue             <0>
    Oct 15 22:01:13 hermes kernel:  TDH                  <ff>
    Oct 15 22:01:13 hermes kernel:  TDT                  <ee>
    Oct 15 22:01:13 hermes kernel:  next_to_use          <ee>
    Oct 15 22:01:13 hermes kernel:  next_to_clean        <fe>
    Oct 15 22:01:13 hermes kernel: buffer_info[next_to_clean]
    Oct 15 22:01:13 hermes kernel:  time_stamp           <1031cfe6d>
    Oct 15 22:01:13 hermes kernel:  next_to_watch        <2>
    Oct 15 22:01:13 hermes kernel:  jiffies              <1031d0000>
    Oct 15 22:01:13 hermes kernel:  next_to_watch.status <0>
    Oct 15 22:01:15 hermes kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
    Oct 15 22:01:15 hermes kernel:  Tx Queue             <0>
    Oct 15 22:01:15 hermes kernel:  TDH                  <ff>
    Oct 15 22:01:15 hermes kernel:  TDT                  <ee>
    Oct 15 22:01:15 hermes kernel:  next_to_use          <ee>
    Oct 15 22:01:15 hermes kernel:  next_to_clean        <fe>
    Oct 15 22:01:15 hermes kernel: buffer_info[next_to_clean]
    Oct 15 22:01:15 hermes kernel:  time_stamp           <1031cfe6d>
    Oct 15 22:01:15 hermes kernel:  next_to_watch        <2>
    Oct 15 22:01:15 hermes kernel:  jiffies              <1031d01f4>
    Oct 15 22:01:15 hermes kernel:  next_to_watch.status <0>
    Oct 15 22:01:17 hermes kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
    Oct 15 22:01:17 hermes kernel:  Tx Queue             <0>
    Oct 15 22:01:17 hermes kernel:  TDH                  <ff>
    Oct 15 22:01:17 hermes kernel:  TDT                  <ee>
    Oct 15 22:01:17 hermes kernel:  next_to_use          <ee>
    Oct 15 22:01:17 hermes kernel:  next_to_clean        <fe>
    Oct 15 22:01:17 hermes kernel: buffer_info[next_to_clean]
    Oct 15 22:01:17 hermes kernel:  time_stamp           <1031cfe6d>
    Oct 15 22:01:17 hermes kernel:  next_to_watch        <2>
    Oct 15 22:01:17 hermes kernel:  jiffies              <1031d03e8>
    Oct 15 22:01:17 hermes kernel:  next_to_watch.status <0>
    Oct 15 22:01:18 hermes kernel: ------------[ cut here ]------------
    Oct 15 22:01:18 hermes kernel: WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x143/0x1eb()
    Oct 15 22:01:18 hermes kernel: Hardware name: PRIMERGY RX300 S4
    Oct 15 22:01:18 hermes kernel: NETDEV WATCHDOG: eth0 (e1000): transmit queue 0 timed out
    Oct 15 22:01:18 hermes kernel: Modules linked in: coretemp ipmi_devintf ipmi_si ipmi_msghandler bonding nf_conntrack_ftp binfmt_misc usbhid i2c_i801 i5000_edac i2c_core i5k_amb uhci_hcd ehci_hcd sg usbcore [last unloaded: microcode]
    Oct 15 22:01:18 hermes kernel: Pid: 0, comm: swapper Not tainted 2.6.31.4 #4
    Oct 15 22:01:18 hermes kernel: Call Trace:
    Oct 15 22:01:18 hermes kernel: <IRQ>  [<ffffffff810686bf>] warn_slowpath_common+0x88/0xb6
    Oct 15 22:01:18 hermes kernel: [<ffffffff81068770>] warn_slowpath_fmt+0x4b/0x61
    Oct 15 22:01:18 hermes kernel: [<ffffffff813995fb>] ? netdev_drivername+0x52/0x70
    Oct 15 22:01:18 hermes kernel: [<ffffffff813ac5dc>] dev_watchdog+0x143/0x1eb
    Oct 15 22:01:18 hermes kernel: [<ffffffff8107ba1f>] ? __queue_work+0x44/0x61
    Oct 15 22:01:18 hermes kernel: [<ffffffff810731d1>] run_timer_softirq+0x1a8/0x238
    Oct 15 22:01:18 hermes kernel: [<ffffffff8108af33>] ? clockevents_program_event+0x88/0xa5
    Oct 15 22:01:18 hermes kernel: [<ffffffff8106e6db>] __do_softirq+0xab/0x160
    Oct 15 22:01:18 hermes kernel: [<ffffffff8102cdac>] call_softirq+0x1c/0x28
    Oct 15 22:01:18 hermes kernel: [<ffffffff8102ee55>] do_softirq+0x51/0xae
    Oct 15 22:01:18 hermes kernel: [<ffffffff8106e2f4>] irq_exit+0x52/0xa3
    Oct 15 22:01:18 hermes kernel: [<ffffffff810442e7>] smp_apic_timer_interrupt+0x9c/0xc1
    Oct 15 22:01:18 hermes kernel: [<ffffffff8102c773>] apic_timer_interrupt+0x13/0x20
    Oct 15 22:01:18 hermes kernel: <EOI>  [<ffffffff81274aea>] ? acpi_idle_enter_simple+0x17e/0x1c6
    Oct 15 22:01:18 hermes kernel: [<ffffffff81274ae3>] ? acpi_idle_enter_simple+0x177/0x1c6
    Oct 15 22:01:18 hermes kernel: [<ffffffff8137a924>] ? cpuidle_idle_call+0x9b/0xe7
    Oct 15 22:01:18 hermes kernel: [<ffffffff8102aeb4>] ? cpu_idle+0xb0/0xf3
    Oct 15 22:01:18 hermes kernel: [<ffffffff81421b36>] ? start_secondary+0x1b8/0x1d3
    Oct 15 22:01:18 hermes kernel: ---[ end trace 5d760977cd95430f ]---
    Oct 15 22:01:18 hermes kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
    Oct 15 22:01:18 hermes kernel: bonding: bond0: making interface eth2 the new active one.
    Oct 15 22:01:21 hermes kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
    Oct 15 22:01:21 hermes kernel: bonding: bond0: link status definitely up for interface eth0.

This happened with a plain kernel.org kernel 2.6.31.4. The ethernet card
is a PCI-X card (ie. using the e1000 driver), here the output of lspci:

    05:04.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)
            Subsystem: Intel Corporation Device 118a
            Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
            Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Latency: 64 (63750ns min), Cache Line Size: 64 bytes
            Interrupt: pin A routed to IRQ 24
            Region 0: Memory at f9280000 (64-bit, non-prefetchable) [size=128K]
            Region 2: Memory at f9240000 (64-bit, non-prefetchable) [size=256K]
            Region 4: I/O ports at 4000 [size=64]
            [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
            Capabilities: [dc] Power Management version 2
                    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
            Capabilities: [e4] PCI-X non-bridge device
                    Command: DPERE- ERO+ RBC=512 OST=1
                    Status: Dev=05:04.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
            Capabilities: [f0] MSI: Enable- Count=1/1 Maskable- 64bit+
                    Address: 0000000000000000  Data: 0000
            Kernel driver in use: e1000

    05:04.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)
            Subsystem: Intel Corporation Device 118a
            Physical Slot: 4
            Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
            Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Latency: 64 (63750ns min), Cache Line Size: 64 bytes
            Interrupt: pin B routed to IRQ 25
            Region 0: Memory at f92a0000 (64-bit, non-prefetchable) [size=128K]
            Region 2: Memory at f9300000 (64-bit, non-prefetchable) [size=256K]
            Region 4: I/O ports at 4400 [size=64]
            [virtual] Expansion ROM at c0040000 [disabled] [size=256K]
            Capabilities: [dc] Power Management version 2
                    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
            Capabilities: [e4] PCI-X non-bridge device
                    Command: DPERE- ERO+ RBC=512 OST=1
                    Status: Dev=05:04.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
            Capabilities: [f0] MSI: Enable- Count=1/1 Maskable- 64bit+
                    Address: 0000000000000000  Data: 0000
            Kernel driver in use: e1000

Googling I see that in the past that there are lots of reports, but not
recently. From those reports I read one should disable
tcp-segmentation-offload, which I did as a first step. Anything else
I can do? Or what other information can I provide to help solve
this problem?

Thanks,
Holger

PS: Please CC me since I am not subscribed.

^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Willy Tarreau @ 2009-10-16  6:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, David Miller, netdev
In-Reply-To: <4AD80D1F.3090601@gmail.com>

On Fri, Oct 16, 2009 at 08:05:19AM +0200, Eric Dumazet wrote:
> > Couldn't we just rely on the retrans vs rskq_defer_accept comparison ?
> > 
> 
> In this case, we lose TCP_DEFER_ACCEPT advantage in case one SYN-ACK was dropped
> by the network : We wakeup the listening server when first ACK comes from client,
> instead of really wait the request.
> 
> I think being able to count pure-acks would be slighly better, and cost nothing.
> 
> 
> retrans is the number of SYN-RECV (re)sent, while req_acks would count number of
> pure ACK received.
> 
> Those numbers, in an ideal world should be related, but could differ in real world ?

Yes it could differ if a pure ACK is lost between the client and the server,
but in my opinion what is important is not to precisely account the number
of ACKs to ensure we wake up exactly after XXX ACKs received, but that in
most common situations we avoid to wake up too early.

Also, keep in mind that the TCP_DEFER_ACCEPT parameter is passed in number
of seconds by the application, which are in turn converted to a number of
retransmits based on our own timer, which means that our SYN-ACK counter
is what most closely matches the application's expected delay, even if an
ACK from the client gets lost in between or if a client's stack retransmits
pure ACKs very fast for any implementation-specific reason.

Regards,
Willy


^ permalink raw reply

* [PATCH NEXT 6/7] netxen: fix error codes in for tools access
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1255673353-31797-1-git-send-email-dhananjay@netxen.com>

Use -EIO or -EINVAL as error codes, these can get passed up
to applications (tools).

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic_hw.c   |   49 +++++++++++++++++----------------
 drivers/net/netxen/netxen_nic_init.c |    6 ++++
 2 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index ce07d13..320ded9 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -332,7 +332,7 @@ netxen_pcie_sem_lock(struct netxen_adapter *adapter, int sem, u32 id_reg)
 		if (done == 1)
 			break;
 		if (++timeout >= NETXEN_PCIE_SEM_TIMEOUT)
-			return -1;
+			return -EIO;
 		msleep(1);
 	}
 
@@ -1083,7 +1083,7 @@ netxen_nic_pci_set_crbwindow_128M(struct netxen_adapter *adapter,
 }
 
 /*
- * Return -1 if off is not valid,
+ * Returns < 0 if off is not valid,
  *	 1 if window access is needed. 'off' is set to offset from
  *	   CRB space in 128M pci map
  *	 0 if no window access is needed. 'off' is set to 2M addr
@@ -1096,7 +1096,7 @@ netxen_nic_pci_get_crb_addr_2M(struct netxen_adapter *adapter, ulong *off)
 
 
 	if (*off >= NETXEN_CRB_MAX)
-		return -1;
+		return -EINVAL;
 
 	if (*off >= NETXEN_PCI_CAMQM && (*off < NETXEN_PCI_CAMQM_2M_END)) {
 		*off = (*off - NETXEN_PCI_CAMQM) + NETXEN_PCI_CAMQM_2M_BASE +
@@ -1105,7 +1105,7 @@ netxen_nic_pci_get_crb_addr_2M(struct netxen_adapter *adapter, ulong *off)
 	}
 
 	if (*off < NETXEN_PCI_CRBSPACE)
-		return -1;
+		return -EINVAL;
 
 	*off -= NETXEN_PCI_CRBSPACE;
 
@@ -1220,25 +1220,26 @@ netxen_nic_hw_write_wx_2M(struct netxen_adapter *adapter, ulong off, u32 data)
 
 	rv = netxen_nic_pci_get_crb_addr_2M(adapter, &off);
 
-	if (rv == -1) {
-		printk(KERN_ERR "%s: invalid offset: 0x%016lx\n",
-				__func__, off);
-		dump_stack();
-		return -1;
+	if (rv == 0) {
+		writel(data, (void __iomem *)off);
+		return 0;
 	}
 
-	if (rv == 1) {
+	if (rv > 0) {
+		/* indirect access */
 		write_lock_irqsave(&adapter->ahw.crb_lock, flags);
 		crb_win_lock(adapter);
 		netxen_nic_pci_set_crbwindow_2M(adapter, &off);
 		writel(data, (void __iomem *)off);
 		crb_win_unlock(adapter);
 		write_unlock_irqrestore(&adapter->ahw.crb_lock, flags);
-	} else
-		writel(data, (void __iomem *)off);
-
+		return 0;
+	}
 
-	return 0;
+	dev_err(&adapter->pdev->dev,
+			"%s: invalid offset: 0x%016lx\n", __func__, off);
+	dump_stack();
+	return -EIO;
 }
 
 static u32
@@ -1250,24 +1251,24 @@ netxen_nic_hw_read_wx_2M(struct netxen_adapter *adapter, ulong off)
 
 	rv = netxen_nic_pci_get_crb_addr_2M(adapter, &off);
 
-	if (rv == -1) {
-		printk(KERN_ERR "%s: invalid offset: 0x%016lx\n",
-				__func__, off);
-		dump_stack();
-		return -1;
-	}
+	if (rv == 0)
+		return readl((void __iomem *)off);
 
-	if (rv == 1) {
+	if (rv > 0) {
+		/* indirect access */
 		write_lock_irqsave(&adapter->ahw.crb_lock, flags);
 		crb_win_lock(adapter);
 		netxen_nic_pci_set_crbwindow_2M(adapter, &off);
 		data = readl((void __iomem *)off);
 		crb_win_unlock(adapter);
 		write_unlock_irqrestore(&adapter->ahw.crb_lock, flags);
-	} else
-		data = readl((void __iomem *)off);
+		return data;
+	}
 
-	return data;
+	dev_err(&adapter->pdev->dev,
+			"%s: invalid offset: 0x%016lx\n", __func__, off);
+	dump_stack();
+	return -1;
 }
 
 /* window 1 registers only */
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index a524844..d8c4b70 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -832,6 +832,12 @@ void netxen_request_firmware(struct netxen_adapter *adapter)
 		goto request_fw;
 	}
 
+	if (NX_IS_REVISION_P3P(adapter->ahw.revision_id)) {
+		/* No file firmware for the time being */
+		fw_type = NX_FLASH_ROMIMAGE;
+		goto done;
+	}
+
 	fw_type = netxen_p3_has_mn(adapter) ?
 		NX_P3_MN_ROMIMAGE : NX_P3_CT_ROMIMAGE;
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 7/7] netxen: sysfs control for auto firmware recovery
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, Narender Kumar
In-Reply-To: <1255673353-31797-1-git-send-email-dhananjay@netxen.com>

From: Narender Kumar <narender.kumar@qlogic.com>

Firmware hang detection and recovery (reset) need to
be disabled for diagnostic tools, which can run
some disruptive tests.

This adds a driver level control to turn off this
feature by diag tools.

Signed-off-by: Narender Kumar <narender.kumar@qlogic.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic.h      |    3 ++
 drivers/net/netxen/netxen_nic_main.c |   50 +++++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index ae4bc7b..9936f92 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -1044,6 +1044,9 @@ typedef struct {
 #define LINKEVENT_LINKSPEED_MBPS	0
 #define LINKEVENT_LINKSPEED_ENCODED	1
 
+#define AUTO_FW_RESET_ENABLED	0xEF10AF12
+#define AUTO_FW_RESET_DISABLED	0xDCBAAF12
+
 /* firmware response header:
  *	63:58 - message type
  *	57:56 - owner
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 30d9afe..bfe8fcc 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -52,6 +52,8 @@ static int use_msi = 1;
 
 static int use_msi_x = 1;
 
+static unsigned long auto_fw_reset = AUTO_FW_RESET_ENABLED;
+
 /* Local functions to NetXen NIC driver */
 static int __devinit netxen_nic_probe(struct pci_dev *pdev,
 		const struct pci_device_id *ent);
@@ -2264,7 +2266,8 @@ netxen_check_health(struct netxen_adapter *adapter)
 	dev_info(&netdev->dev, "firmware hang detected\n");
 
 detach:
-	if (!test_and_set_bit(__NX_RESETTING, &adapter->state))
+	if ((auto_fw_reset == AUTO_FW_RESET_ENABLED) &&
+			!test_and_set_bit(__NX_RESETTING, &adapter->state))
 		netxen_schedule_work(adapter, netxen_detach_work, 0);
 	return 1;
 }
@@ -2496,6 +2499,41 @@ static struct bin_attribute bin_attr_mem = {
 	.write = netxen_sysfs_write_mem,
 };
 
+static ssize_t
+netxen_store_auto_fw_reset(struct module_attribute *mattr,
+		struct module *mod, const char *buf, size_t count)
+
+{
+	unsigned long new;
+
+	if (strict_strtoul(buf, 16, &new))
+		return -EINVAL;
+
+	if ((new == AUTO_FW_RESET_ENABLED) || (new == AUTO_FW_RESET_DISABLED)) {
+		auto_fw_reset = new;
+		return count;
+	}
+
+	return -EINVAL;
+}
+
+static ssize_t
+netxen_show_auto_fw_reset(struct module_attribute *mattr,
+		struct module *mod, char *buf)
+
+{
+	if (auto_fw_reset == AUTO_FW_RESET_ENABLED)
+		return sprintf(buf, "enabled\n");
+	else
+		return sprintf(buf, "disabled\n");
+}
+
+static struct module_attribute mod_attr_fw_reset = {
+	.attr = {.name = "auto_fw_reset", .mode = (S_IRUGO | S_IWUSR)},
+	.show = netxen_show_auto_fw_reset,
+	.store = netxen_store_auto_fw_reset,
+};
+
 static void
 netxen_create_sysfs_entries(struct netxen_adapter *adapter)
 {
@@ -2700,12 +2738,18 @@ static struct pci_driver netxen_driver = {
 
 static int __init netxen_init_module(void)
 {
+	struct module *mod = THIS_MODULE;
+
 	printk(KERN_INFO "%s\n", netxen_nic_driver_string);
 
 #ifdef CONFIG_INET
 	register_netdevice_notifier(&netxen_netdev_cb);
 	register_inetaddr_notifier(&netxen_inetaddr_cb);
 #endif
+	
+	if (sysfs_create_file(&mod->mkobj.kobj, &mod_attr_fw_reset.attr))
+		printk(KERN_ERR "%s: Failed to create auto_fw_reset "
+				"sysfs entry.", netxen_nic_driver_name);
 
 	return pci_register_driver(&netxen_driver);
 }
@@ -2714,6 +2758,10 @@ module_init(netxen_init_module);
 
 static void __exit netxen_exit_module(void)
 {
+	struct module *mod = THIS_MODULE;
+
+	sysfs_remove_file(&mod->mkobj.kobj, &mod_attr_fw_reset.attr);
+
 	pci_unregister_driver(&netxen_driver);
 
 #ifdef CONFIG_INET
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 5/7] netxen: onchip memory access change
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, Amit Kumar Salecha
In-Reply-To: <1255673353-31797-1-git-send-email-dhananjay@netxen.com>

From: Amit Kumar Salecha <amit@netxen.com>

Add support for different windowing scheme for on chip
memory in future chip revisions. This is required by
diagnostic tools.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic_hdr.h  |    3 +++
 drivers/net/netxen/netxen_nic_hw.c   |   21 ++++++++++++++-------
 drivers/net/netxen/netxen_nic_main.c |    6 +++++-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_hdr.h b/drivers/net/netxen/netxen_nic_hdr.h
index d40fe33..7386a7c 100644
--- a/drivers/net/netxen/netxen_nic_hdr.h
+++ b/drivers/net/netxen/netxen_nic_hdr.h
@@ -867,6 +867,9 @@ enum {
 		(PCIX_SN_WINDOW_F0 + (0x20 * (func))) :\
 		(PCIX_SN_WINDOW_F4 + (0x10 * ((func)-4))))
 
+#define PCIX_OCM_WINDOW		(0x10800)
+#define PCIX_OCM_WINDOW_REG(func)	(PCIX_OCM_WINDOW + 0x20 * (func))
+
 #define PCIX_TARGET_STATUS	(0x10118)
 #define PCIX_TARGET_STATUS_F1	(0x10160)
 #define PCIX_TARGET_STATUS_F2	(0x10164)
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index 52a2f2d..ce07d13 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -31,6 +31,7 @@
 #define MASK(n) ((1ULL<<(n))-1)
 #define MN_WIN(addr) (((addr & 0x1fc0000) >> 1) | ((addr >> 25) & 0x3ff))
 #define OCM_WIN(addr) (((addr & 0x1ff0000) >> 1) | ((addr >> 25) & 0x3ff))
+#define OCM_WIN_P3P(addr) (addr & 0xffc0000)
 #define MS_WIN(addr) (addr & 0x0ffc0000)
 
 #define GET_MEM_OFFS_2M(addr) (addr & MASK(18))
@@ -1347,13 +1348,19 @@ netxen_nic_pci_set_window_2M(struct netxen_adapter *adapter,
 		return -EIO;
 	}
 
-	window = OCM_WIN(addr);
-	writel(window, adapter->ahw.ocm_win_crb);
-	win_read = readl(adapter->ahw.ocm_win_crb);
-	if ((win_read >> 7) != window) {
-		if (printk_ratelimit())
-			dev_warn(&pdev->dev, "failed to set OCM window\n");
-		return -EIO;
+	if (NX_IS_REVISION_P3P(adapter->ahw.revision_id)) {
+		window = OCM_WIN_P3P(addr);
+		writel(window, adapter->ahw.ocm_win_crb);
+	} else {
+		window = OCM_WIN(addr);
+		writel(window, adapter->ahw.ocm_win_crb);
+		win_read = readl(adapter->ahw.ocm_win_crb);
+		if ((win_read >> 7) != window) {
+			if (printk_ratelimit())
+				dev_warn(&pdev->dev,
+						"failed to set OCM window\n");
+			return -EIO;
+		}
 	}
 
 	adapter->ahw.ocm_win = window;
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 2d772dd..30d9afe 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -649,7 +649,11 @@ netxen_setup_pci_map(struct netxen_adapter *adapter)
 	adapter->ahw.pci_base1 = mem_ptr1;
 	adapter->ahw.pci_base2 = mem_ptr2;
 
-	if (!NX_IS_REVISION_P2(adapter->ahw.revision_id)) {
+	if (NX_IS_REVISION_P3P(adapter->ahw.revision_id)) {
+		adapter->ahw.ocm_win_crb = netxen_get_ioaddr(adapter,
+			NETXEN_PCIX_PS_REG(PCIX_OCM_WINDOW_REG(pci_func)));
+
+	} else if (NX_IS_REVISION_P3(adapter->ahw.revision_id)) {
 		adapter->ahw.ocm_win_crb = netxen_get_ioaddr(adapter,
 			NETXEN_PCIX_PS_REG(PCIE_MN_WINDOW_REG(pci_func)));
 	}
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 3/7] netxen: 128 memory controller support
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, Amit Kumar Salecha, Amit Kumar Salecha
In-Reply-To: <1255673353-31797-1-git-send-email-dhananjay@netxen.com>

From: Amit Kumar Salecha <amit@qlogic.com>

Future revisions of the chip have 128 bit memory
transactions. Require drivers to implement rmw
in case of sub-128 bit accesses by driver. This
is mostly used by diagnostic tools.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic_hdr.h |    8 ++++-
 drivers/net/netxen/netxen_nic_hw.c  |   55 +++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_hdr.h b/drivers/net/netxen/netxen_nic_hdr.h
index 3461350..d40fe33 100644
--- a/drivers/net/netxen/netxen_nic_hdr.h
+++ b/drivers/net/netxen/netxen_nic_hdr.h
@@ -678,10 +678,14 @@ enum {
 #define MIU_TEST_AGT_ADDR_HI		(0x08)
 #define MIU_TEST_AGT_WRDATA_LO		(0x10)
 #define MIU_TEST_AGT_WRDATA_HI		(0x14)
-#define MIU_TEST_AGT_WRDATA(i)		(0x10+(4*(i)))
+#define MIU_TEST_AGT_WRDATA_UPPER_LO	(0x20)
+#define MIU_TEST_AGT_WRDATA_UPPER_HI	(0x24)
+#define MIU_TEST_AGT_WRDATA(i)		(0x10+(0x10*((i)>>1))+(4*((i)&1)))
 #define MIU_TEST_AGT_RDDATA_LO		(0x18)
 #define MIU_TEST_AGT_RDDATA_HI		(0x1c)
-#define MIU_TEST_AGT_RDDATA(i)		(0x18+(4*(i)))
+#define MIU_TEST_AGT_RDDATA_UPPER_LO	(0x28)
+#define MIU_TEST_AGT_RDDATA_UPPER_HI	(0x2c)
+#define MIU_TEST_AGT_RDDATA(i)		(0x18+(0x10*((i)>>1))+(4*((i)&1)))
 
 #define MIU_TEST_AGT_ADDR_MASK		0xfffffff8
 #define MIU_TEST_AGT_UPPER_ADDR(off)	(0)
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index d067bee..52a2f2d 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -1569,8 +1569,9 @@ static int
 netxen_nic_pci_mem_write_2M(struct netxen_adapter *adapter,
 		u64 off, u64 data)
 {
-	int j, ret;
+	int i, j, ret;
 	u32 temp, off8;
+	u64 stride;
 	void __iomem *mem_crb;
 
 	/* Only 64-bit aligned access */
@@ -1597,14 +1598,45 @@ netxen_nic_pci_mem_write_2M(struct netxen_adapter *adapter,
 	return -EIO;
 
 correct:
-	off8 = off & MIU_TEST_AGT_ADDR_MASK;
+	stride = NX_IS_REVISION_P3P(adapter->ahw.revision_id) ? 16 : 8;
+
+	off8 = off & ~(stride-1);
 
 	spin_lock(&adapter->ahw.mem_lock);
 
 	writel(off8, (mem_crb + MIU_TEST_AGT_ADDR_LO));
 	writel(0, (mem_crb + MIU_TEST_AGT_ADDR_HI));
-	writel(data & 0xffffffff, mem_crb + MIU_TEST_AGT_WRDATA_LO);
-	writel((data >> 32) & 0xffffffff, mem_crb + MIU_TEST_AGT_WRDATA_HI);
+
+	i = 0;
+	if (stride == 16) {
+		writel(TA_CTL_ENABLE, (mem_crb + TEST_AGT_CTRL));
+		writel((TA_CTL_START | TA_CTL_ENABLE),
+				(mem_crb + TEST_AGT_CTRL));
+
+		for (j = 0; j < MAX_CTL_CHECK; j++) {
+			temp = readl(mem_crb + TEST_AGT_CTRL);
+			if ((temp & TA_CTL_BUSY) == 0)
+				break;
+		}
+
+		if (j >= MAX_CTL_CHECK) {
+			ret = -EIO;
+			goto done;
+		}
+
+		i = (off & 0xf) ? 0 : 2;
+		writel(readl(mem_crb + MIU_TEST_AGT_RDDATA(i)),
+				mem_crb + MIU_TEST_AGT_WRDATA(i));
+		writel(readl(mem_crb + MIU_TEST_AGT_RDDATA(i+1)),
+				mem_crb + MIU_TEST_AGT_WRDATA(i+1));
+		i = (off & 0xf) ? 2 : 0;
+	}
+
+	writel(data & 0xffffffff,
+			mem_crb + MIU_TEST_AGT_WRDATA(i));
+	writel((data >> 32) & 0xffffffff,
+			mem_crb + MIU_TEST_AGT_WRDATA(i+1));
+
 	writel((TA_CTL_ENABLE | TA_CTL_WRITE), (mem_crb + TEST_AGT_CTRL));
 	writel((TA_CTL_START | TA_CTL_ENABLE | TA_CTL_WRITE),
 			(mem_crb + TEST_AGT_CTRL));
@@ -1623,6 +1655,7 @@ correct:
 	} else
 		ret = 0;
 
+done:
 	spin_unlock(&adapter->ahw.mem_lock);
 
 	return ret;
@@ -1634,7 +1667,7 @@ netxen_nic_pci_mem_read_2M(struct netxen_adapter *adapter,
 {
 	int j, ret;
 	u32 temp, off8;
-	u64 val;
+	u64 val, stride;
 	void __iomem *mem_crb;
 
 	/* Only 64-bit aligned access */
@@ -1663,7 +1696,9 @@ netxen_nic_pci_mem_read_2M(struct netxen_adapter *adapter,
 	return -EIO;
 
 correct:
-	off8 = off & MIU_TEST_AGT_ADDR_MASK;
+	stride = NX_IS_REVISION_P3P(adapter->ahw.revision_id) ? 16 : 8;
+
+	off8 = off & ~(stride-1);
 
 	spin_lock(&adapter->ahw.mem_lock);
 
@@ -1684,9 +1719,13 @@ correct:
 					"failed to read through agent\n");
 		ret = -EIO;
 	} else {
-		temp = readl(mem_crb + MIU_TEST_AGT_RDDATA_HI);
+		off8 = MIU_TEST_AGT_RDDATA_LO;
+		if ((stride == 16) && (off & 0xf))
+			off8 = MIU_TEST_AGT_RDDATA_UPPER_LO;
+
+		temp = readl(mem_crb + off8 + 4);
 		val = (u64)temp << 32;
-		val |= readl(mem_crb + MIU_TEST_AGT_RDDATA_LO);
+		val |= readl(mem_crb + off8);
 		*data = val;
 		ret = 0;
 	}
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 2/7] netxen: defines for next revision
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, Amit Kumar Salecha, Amit Kumar Salecha
In-Reply-To: <1255673353-31797-1-git-send-email-dhananjay@netxen.com>

From: Amit Kumar Salecha <amit@qlogic.com>

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 3d990e5..b1aca52 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -117,9 +117,11 @@
 #define NX_P3_B0		0x40
 #define NX_P3_B1		0x41
 #define NX_P3_B2		0x42
+#define NX_P3P_A0		0x50
 
 #define NX_IS_REVISION_P2(REVISION)     (REVISION <= NX_P2_C1)
 #define NX_IS_REVISION_P3(REVISION)     (REVISION >= NX_P3_A0)
+#define NX_IS_REVISION_P3P(REVISION)     (REVISION >= NX_P3P_A0)
 
 #define FIRST_PAGE_GROUP_START	0
 #define FIRST_PAGE_GROUP_END	0x100000
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 1/7] netxen; update version to 4.0.56
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1255673353-31797-1-git-send-email-dhananjay@netxen.com>

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 1bdb8f4..3d990e5 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -53,8 +53,8 @@
 
 #define _NETXEN_NIC_LINUX_MAJOR 4
 #define _NETXEN_NIC_LINUX_MINOR 0
-#define _NETXEN_NIC_LINUX_SUBVERSION 50
-#define NETXEN_NIC_LINUX_VERSIONID  "4.0.50"
+#define _NETXEN_NIC_LINUX_SUBVERSION 56
+#define NETXEN_NIC_LINUX_VERSIONID  "4.0.56"
 
 #define NETXEN_VERSION_CODE(a, b, c)	(((a) << 24) + ((b) << 16) + (c))
 #define _major(v)	(((v) >> 24) & 0xff)
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 4/7] netxen: reset sequence changes
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, Amit Kumar Salecha, Amit Kumar Salecha
In-Reply-To: <1255673353-31797-1-git-send-email-dhananjay@netxen.com>

From: Amit Kumar Salecha <amit@qlogic.com>

Future revisions need different chip reset sequence
and firmware initialization.

Also clean up some never used debug code.

Signed-off-by: Amit Kumar Salecha <amit@netxen.com>
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
---
 drivers/net/netxen/netxen_nic.h      |    2 +-
 drivers/net/netxen/netxen_nic_init.c |   41 +++++++++++----------------------
 drivers/net/netxen/netxen_nic_main.c |    2 +-
 3 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index b1aca52..ae4bc7b 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -1270,7 +1270,7 @@ int netxen_load_firmware(struct netxen_adapter *adapter);
 int netxen_need_fw_reset(struct netxen_adapter *adapter);
 void netxen_request_firmware(struct netxen_adapter *adapter);
 void netxen_release_firmware(struct netxen_adapter *adapter);
-int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose);
+int netxen_pinit_from_rom(struct netxen_adapter *adapter);
 
 int netxen_rom_fast_read(struct netxen_adapter *adapter, int addr, int *valp);
 int netxen_rom_fast_read_words(struct netxen_adapter *adapter, int addr,
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index 83387c7..a524844 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -437,7 +437,7 @@ int netxen_rom_fast_read(struct netxen_adapter *adapter, int addr, int *valp)
 #define NETXEN_BOARDNUM 		0x400c
 #define NETXEN_CHIPNUM			0x4010
 
-int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
+int netxen_pinit_from_rom(struct netxen_adapter *adapter)
 {
 	int addr, val;
 	int i, n, init_delay = 0;
@@ -450,21 +450,6 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
 	NXWR32(adapter, NETXEN_ROMUSB_GLB_SW_RESET, 0xffffffff);
 	netxen_rom_unlock(adapter);
 
-	if (verbose) {
-		if (netxen_rom_fast_read(adapter, NETXEN_BOARDTYPE, &val) == 0)
-			printk("P2 ROM board type: 0x%08x\n", val);
-		else
-			printk("Could not read board type\n");
-		if (netxen_rom_fast_read(adapter, NETXEN_BOARDNUM, &val) == 0)
-			printk("P2 ROM board  num: 0x%08x\n", val);
-		else
-			printk("Could not read board number\n");
-		if (netxen_rom_fast_read(adapter, NETXEN_CHIPNUM, &val) == 0)
-			printk("P2 ROM chip   num: 0x%08x\n", val);
-		else
-			printk("Could not read chip number\n");
-	}
-
 	if (NX_IS_REVISION_P3(adapter->ahw.revision_id)) {
 		if (netxen_rom_fast_read(adapter, 0, &n) != 0 ||
 			(n != 0xcafecafe) ||
@@ -486,11 +471,7 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
 		n &= ~0x80000000;
 	}
 
-	if (n < 1024) {
-		if (verbose)
-			printk(KERN_DEBUG "%s: %d CRB init values found"
-			       " in ROM.\n", netxen_nic_driver_name, n);
-	} else {
+	if (n >= 1024) {
 		printk(KERN_ERR "%s:n=0x%x Error! NetXen card flash not"
 		       " initialized.\n", __func__, n);
 		return -EIO;
@@ -502,6 +483,7 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
 				netxen_nic_driver_name);
 		return -ENOMEM;
 	}
+
 	for (i = 0; i < n; i++) {
 		if (netxen_rom_fast_read(adapter, 8*i + 4*offset, &val) != 0 ||
 		netxen_rom_fast_read(adapter, 8*i + 4*offset + 4, &addr) != 0) {
@@ -512,11 +494,8 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
 		buf[i].addr = addr;
 		buf[i].data = val;
 
-		if (verbose)
-			printk(KERN_DEBUG "%s: PCI:     0x%08x == 0x%08x\n",
-				netxen_nic_driver_name,
-				(u32)netxen_decode_crb_addr(addr), val);
 	}
+
 	for (i = 0; i < n; i++) {
 
 		off = netxen_decode_crb_addr(buf[i].addr);
@@ -526,6 +505,10 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
 			continue;
 		}
 		off += NETXEN_PCI_CRBSPACE;
+
+		if (off & 1)
+			continue;
+
 		/* skipping cold reboot MAGIC */
 		if (off == NETXEN_CAM_RAM(0x1fc))
 			continue;
@@ -542,7 +525,8 @@ int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose)
 				continue;
 			if (off == (ROMUSB_GLB + 0x1c)) /* MS clock */
 				continue;
-			if (off == (NETXEN_CRB_PEG_NET_1 + 0x18))
+			if (off == (NETXEN_CRB_PEG_NET_1 + 0x18) &&
+				!NX_IS_REVISION_P3P(adapter->ahw.revision_id))
 				buf[i].data = 0x1020;
 			/* skip the function enable register */
 			if (off == NETXEN_PCIE_REG(PCIE_SETUP_FUNCTION))
@@ -751,7 +735,10 @@ netxen_load_firmware(struct netxen_adapter *adapter)
 	}
 	msleep(1);
 
-	if (NX_IS_REVISION_P3(adapter->ahw.revision_id))
+	if (NX_IS_REVISION_P3P(adapter->ahw.revision_id)) {
+		NXWR32(adapter, NETXEN_CRB_PEG_NET_0 + 0x18, 0x1020);
+		NXWR32(adapter, NETXEN_ROMUSB_GLB_SW_RESET, 0x80001e);
+	} else if (NX_IS_REVISION_P3(adapter->ahw.revision_id))
 		NXWR32(adapter, NETXEN_ROMUSB_GLB_SW_RESET, 0x80001d);
 	else {
 		NXWR32(adapter, NETXEN_ROMUSB_GLB_CHIP_CLK_CTRL, 0x3fff);
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 5bc8520..2d772dd 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -810,7 +810,7 @@ netxen_start_firmware(struct netxen_adapter *adapter)
 
 	if (first_boot != 0x55555555) {
 		NXWR32(adapter, CRB_CMDPEG_STATE, 0);
-		netxen_pinit_from_rom(adapter, 0);
+		netxen_pinit_from_rom(adapter);
 		msleep(1);
 	}
 
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH NEXT 0/7] netxen: changes for future chip revisions
From: Dhananjay Phadke @ 2009-10-16  6:09 UTC (permalink / raw)
  To: davem; +Cc: netdev

A 7 patch series to add initial support for next
chip chip revision.

Please apply to net-next-2.6 tree.

Thanks,
	Dhananjay



^ permalink raw reply

* Re: [PATCH 0/2] Add implementation of CCID4 into the DCCP test tree
From: Gerrit Renker @ 2009-10-16  6:07 UTC (permalink / raw)
  To: Ivo Calado; +Cc: dccp, netdev
In-Reply-To: <4AD4B89C.6010704@embedded.ufcg.edu.br>

| These patches add implementation of CCID4 into the DCCP test tree.
Thank you for sending these.

I have taken all three series,
 * 4-part TFRC-SP receiver set,
 * 4-part TFRC-SP sender set,
 * 2-part CCID-4 implementation set,
and replaced the old ccid4 subtree of the DCCP test tree.


You can checkout the whole CCID-4 subtree via
    git://eden-feed.erg.abdn.ac.uk/dccp_exp    => subtree 'ccid4'

and view (or download) the CCID-4 patch series at
    http://eden-feed.erg.abdn.ac.uk/cgi-bin/gitweb.cgi?p=dccp_exp.git;a=log;h=ccid4

These are all your patches, no edits apart from removing some trailing whitespace.
If you make changes to your patches please send them relative to this tree.

So far I have only verified that they build cleanly, will be back next week after
some more tests.

^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Eric Dumazet @ 2009-10-16  6:05 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Julian Anastasov, David Miller, netdev
In-Reply-To: <20091016052913.GB5574@1wt.eu>

Willy Tarreau a écrit :
> On Fri, Oct 16, 2009 at 07:00:49AM +0200, Eric Dumazet wrote:
>> Eric Dumazet a écrit :
>>>
>>> So, it appears defer_accept value is not an inherited attribute,
>>> but shared by all embryons. Therefore we should not touch it.
>>>
>>> Of course it should be done, or add a new connection field to count number
>>> of pure ACKS received on each SYN_RECV embryon.
>>>
>> Could be something like this ? (on top of net-next-2.6 of course)
>>
>> 7 bits is more than enough, we could take 5 bits IMHO.
> 
> Couldn't we just rely on the retrans vs rskq_defer_accept comparison ?
> 

In this case, we lose TCP_DEFER_ACCEPT advantage in case one SYN-ACK was dropped
by the network : We wakeup the listening server when first ACK comes from client,
instead of really wait the request.

I think being able to count pure-acks would be slighly better, and cost nothing.


retrans is the number of SYN-RECV (re)sent, while req_acks would count number of
pure ACK received.

Those numbers, in an ideal world should be related, but could differ in real world ?


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox