Netdev List
 help / color / mirror / Atom feed
* Re: [net-next-2.6 PATCH] igb: add support for reporting 5GT/s during probe on PCIe Gen2
From: David Miller @ 2010-04-27 19:55 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20100427110238.23921.24825.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 04:02:40 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This change corrects the fact that we were not reporting Gen2 link speeds
> when we were in fact connected at Gen2 rates.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] ixgbe: enable extremely low latency
From: David Miller @ 2010-04-27 19:56 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jesse.brandeburg
In-Reply-To: <20100427113651.24431.9221.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 04:37:20 -0700

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> 82598/82599 can support EITR == 0, which allows for the
> absolutely lowest latency setting in the hardware.  This disables
> writeback batching and anything else that relies upon a delayed
> interrupt. This patch enables the feature of "override" when a
> user sets rx-usecs to zero, the driver will respect that setting
> over using RSC, and automatically disable RSC.  If rx-usecs is
> used to set the EITR value to 0, then the driver should disable
> LRO (aka RSC) internally until EITR is set to non-zero again.
> 
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/2] ixgbe: fix bug when EITR=0 causing no writebacks
From: David Miller @ 2010-04-27 19:56 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jesse.brandeburg
In-Reply-To: <20100427113739.24431.46358.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 04:37:41 -0700

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> writebacks can be held indefinitely by hardware if EITR=0, when
> combined with TXDCTL.WTHRESH=8.  When EITR=0, WTHRESH should be
> set back to zero.
> 
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbe: ixgbe_down needs to stop dev_watchdog
From: David Miller @ 2010-04-27 19:56 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, john.r.fastabend, peter.p.waskiewicz.jr
In-Reply-To: <20100427121300.25038.2341.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 05:13:39 -0700

> From: John Fastabend <john.r.fastabend@intel.com>
> 
> There is a small race between when the tx queues are stopped
> and when netif_carrier_off() is called in ixgbe_down.  If the
> dev_watchdog() timer fires during this time it is possible for
> a false tx timeout to occur.
> 
> This patch moves the netif_carrier_off() so that it is called before
> the tx queues are stopped preventing the dev_watchdog timer from
> detecting false tx timeouts.  The race is seen occosionally when
> FCoE or DCB settings are being configured or changed.
> 
> Testing note, running ifconfig up/down will not reproduce this
> issue because dev_open/dev_close call dev_deactivate() and then
> dev_activate().
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: inet_rps_save_rxhash() argument is not const
From: David Miller @ 2010-04-27 19:56 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1272372171.2295.68.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 27 Apr 2010 14:42:51 +0200

> const qualifier on sock argument is misleading, since we can modify rxhash.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

It's a shame that because of the cast the compiler can't see this.
Instead it would be nice if the compiler only allowed casts from
a const pointer to another const pointer.

^ permalink raw reply

* Re: [PATCH 0/4] net: ipmr netlink interface for route dumping
From: David Miller @ 2010-04-27 19:59 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1272374785-3858-1-git-send-email-kaber@trash.net>

From: Patrick@trash.net, McHardy@trash.net, kaber@trash.net
Date: Tue, 27 Apr 2010 15:26:22 +0200

> Please apply or pull from:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6.git master
> 

Pulled, thanks Patrick.

And good luck finding those missing quotes in your git send-email
scripting :-)

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: Jeff Kirsher @ 2010-04-27 20:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, gospo, donald.c.skidmore
In-Reply-To: <20100427.095629.191403481.davem@davemloft.net>

On Tue, Apr 27, 2010 at 09:56, David Miller <davem@davemloft.net> wrote:
>
> This is also not appropriate for net-2.6, it doesn't fix a
> regression in the regression list and it doesn't fix a catastropic
> crash or failure.
>
> You really have to be kidding me if you thing a patch like this
> is fine this late in the -RC series.
>
> You also aren't even numbering your patches, which is quite a
> transgression when a submission of a set of patches all to the same
> driver and/or files.
> --

My apologies.  My reasoning for not numbering them was that the
patches were not related and not dependent upon each other.

As far as sending the the three patches un-acceptable patches this
late in the -rc series, that was poor judgement on my part, sorry.

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: [PATCH 1/3] ptp: Added a brand new class driver for ptp clocks.
From: David Miller @ 2010-04-27 20:07 UTC (permalink / raw)
  To: richardcochran; +Cc: netdev
In-Reply-To: <20100427091405.GA5098@riccoc20.at.omicron.at>

From: Richard Cochran <richardcochran@gmail.com>
Date: Tue, 27 Apr 2010 11:14:05 +0200

> +struct ptp_clock {
> +	struct cdev cdev;
> +	struct device *dev;
> +	struct ptp_clock_info *info;
> +	dev_t devid;
> +	int index; /* index into clocks[], also the minor number */
> +	struct semaphore mux; /* one process at a time on a device */
> +};

A mutex works just as well and is preferable to a semaphore.

> +/* private globals */
> +
> +static const struct file_operations ptp_fops;
> +static dev_t ptp_devt;
> +static struct class *ptp_class;
> +struct ptp_clock *clocks[PTP_MAX_CLOCKS];
> +DEFINE_SPINLOCK(clocks_lock);

The clocks[] table is not protected by any mutual exclusion in the
unregister method, it needs at least a spinlock or similar.  Probably
clocks_lock was meant to be used for this purpose.

Also, having arbitray limits like PTP_MAX_CLOCKS and a linear scan
when registering or unregistering is suboptimal.

Even if we're not expecting to have many of these things, use linux/list.h
list to manage these things.

In fact, if you keep them in a list you don't need to look them up at
all during at least unregister, you can return the real "struct
ptp_clock *" as an opaque ERR_PTR() back to the caller on register and
on unregister you can just list_del() on it.

Don't expose the layout of struct ptp_clock to the users, you don't have
do.  Just:

struct ptp_clock;

in the exported header file, and then you can return "struct ptp_clock *'
from ptp_clock_register() just fine.


^ permalink raw reply

* [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
From: Eric Dumazet @ 2010-04-27 20:21 UTC (permalink / raw)
  To: David Miller; +Cc: bmb, therbert, netdev, rick.jones2
In-Reply-To: <1272389872.2295.405.camel@edumazet-laptop>

Le mardi 27 avril 2010 à 19:37 +0200, Eric Dumazet a écrit :

> We might use the ticket spinlock paradigm to let writers go in parallel
> and let the user the socket lock
> 
> Instead of having the bh_lock_sock() to protect receive_queue *and*
> backlog, writers get a unique slot in a table, that 'user' can handle
> later.
> 
> Or serialize writers (before they try to bh_lock_sock()) with a
> dedicated lock, so that user has 50% chances to get the sock lock,
> contending with at most one writer.

Following patch fixes the issue for me, with little performance hit on
fast path.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on my dev machine.

Thanks !

[PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account

Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |   13 +++++++++++--
 net/core/sock.c    |    5 ++++-
 net/ipv4/udp.c     |    4 ++++
 net/ipv6/udp.c     |    8 ++++++++
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 86a8ca1..4b0097d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -255,7 +255,6 @@ struct sock {
 		struct sk_buff *head;
 		struct sk_buff *tail;
 		int len;
-		int limit;
 	} sk_backlog;
 	wait_queue_head_t	*sk_sleep;
 	struct dst_entry	*sk_dst_cache;
@@ -604,10 +603,20 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 	skb->next = NULL;
 }
 
+/*
+ * Take into account size of receive queue and backlog queue
+ */
+static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
+{
+	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
+
+	return qsize + skb->truesize > sk->sk_rcvbuf;
+}
+
 /* The per-socket spinlock must be held here. */
 static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
-	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1))
+	if (sk_rcvqueues_full(sk, skb))
 		return -ENOBUFS;
 
 	__sk_add_backlog(sk, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 58ebd14..5104175 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -327,6 +327,10 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
 
 	skb->dev = NULL;
 
+	if (sk_rcvqueues_full(sk, skb)) {
+		atomic_inc(&sk->sk_drops);
+		goto discard_and_relse;
+	}
 	if (nested)
 		bh_lock_sock_nested(sk);
 	else
@@ -1885,7 +1889,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_allocation	=	GFP_KERNEL;
 	sk->sk_rcvbuf		=	sysctl_rmem_default;
 	sk->sk_sndbuf		=	sysctl_wmem_default;
-	sk->sk_backlog.limit	=	sk->sk_rcvbuf << 1;
 	sk->sk_state		=	TCP_CLOSE;
 	sk_set_socket(sk, sock);
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1e18f9c..776c844 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1372,6 +1372,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 			goto drop;
 	}
 
+
+	if (sk_rcvqueues_full(sk, skb))
+		goto drop;
+
 	rc = 0;
 
 	bh_lock_sock(sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2850e35..3ead20a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -584,6 +584,10 @@ static void flush_stack(struct sock **stack, unsigned int count,
 
 		sk = stack[i];
 		if (skb1) {
+			if (sk_rcvqueues_full(sk, skb)) {
+				kfree_skb(skb1);
+				goto drop;
+			}
 			bh_lock_sock(sk);
 			if (!sock_owned_by_user(sk))
 				udpv6_queue_rcv_skb(sk, skb1);
@@ -759,6 +763,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	/* deliver */
 
+	if (sk_rcvqueues_full(sk, skb)) {
+		sock_put(sk);
+		goto discard;
+	}
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		udpv6_queue_rcv_skb(sk, skb);



^ permalink raw reply related

* Re: [net-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: David Miller @ 2010-04-27 20:32 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, donald.c.skidmore
In-Reply-To: <u2n9929d2391004271305y9732d81r8b611fbb9f9e8b9a@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 13:05:25 -0700

> As far as sending the the three patches un-acceptable patches this
> late in the -rc series, that was poor judgement on my part, sorry.

No worries, I just pushed out the net-next-2.6 patches you sent to me
so please respin these patches originally targetted to net-2.6 so
I can apply them to net-next-2.6

Thanks.

^ permalink raw reply

* Re: [PATCH v2] net: reimplement softnet_data.output_queue as a FIFO queue
From: Eric Dumazet @ 2010-04-27 20:36 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <1272359184-2929-1-git-send-email-xiaosuo@gmail.com>

Le mardi 27 avril 2010 à 17:06 +0800, Changli Gao a écrit :
> reimplement softnet_data.output_queue as a FIFO queue.
> 
> reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
> the qdiscs rescheduled.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

> ----
>  include/linux/netdevice.h |    1 +
>  net/core/dev.c            |   22 ++++++++++++----------
>  2 files changed, 13 insertions(+), 10 deletions(-)
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3c5ed5f..c04ca24 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1385,6 +1385,7 @@ static inline int unregister_gifconf(unsigned int family)
>   */
>  struct softnet_data {
>  	struct Qdisc		*output_queue;
> +	struct Qdisc		**output_queue_tailp;
>  	struct list_head	poll_list;
>  	struct sk_buff		*completion_queue;
>  
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4d43f1a..3d31491 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1557,8 +1557,9 @@ static inline void __netif_reschedule(struct Qdisc *q)
>  
>  	local_irq_save(flags);
>  	sd = &__get_cpu_var(softnet_data);
> -	q->next_sched = sd->output_queue;
> -	sd->output_queue = q;
> +	q->next_sched = NULL;
> +	*sd->output_queue_tailp = q;
> +	sd->output_queue_tailp = &q->next_sched;
>  	raise_softirq_irqoff(NET_TX_SOFTIRQ);
>  	local_irq_restore(flags);
>  }
> @@ -2529,6 +2530,7 @@ static void net_tx_action(struct softirq_action *h)
>  		local_irq_disable();
>  		head = sd->output_queue;
>  		sd->output_queue = NULL;
> +		sd->output_queue_tailp = &sd->output_queue;
>  		local_irq_enable();
>  
>  		while (head) {
> @@ -5594,7 +5596,6 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>  			    void *ocpu)
>  {
>  	struct sk_buff **list_skb;
> -	struct Qdisc **list_net;
>  	struct sk_buff *skb;
>  	unsigned int cpu, oldcpu = (unsigned long)ocpu;
>  	struct softnet_data *sd, *oldsd;
> @@ -5615,13 +5616,13 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>  	*list_skb = oldsd->completion_queue;
>  	oldsd->completion_queue = NULL;
>  
> -	/* Find end of our output_queue. */
> -	list_net = &sd->output_queue;
> -	while (*list_net)
> -		list_net = &(*list_net)->next_sched;
>  	/* Append output queue from offline CPU. */
> -	*list_net = oldsd->output_queue;
> -	oldsd->output_queue = NULL;
> +	if (oldsd->output_queue) {
> +		*sd->output_queue_tailp = oldsd->output_queue;
> +		sd->output_queue_tailp = oldsd->output_queue_tailp;
> +		oldsd->output_queue = NULL;
> +		oldsd->output_queue_tailp = &oldsd->output_queue;
> +	}
>  
>  	raise_softirq_irqoff(NET_TX_SOFTIRQ);
>  	local_irq_enable();
> @@ -5851,7 +5852,8 @@ static int __init net_dev_init(void)
>  		skb_queue_head_init(&sd->input_pkt_queue);
>  		sd->completion_queue = NULL;
>  		INIT_LIST_HEAD(&sd->poll_list);
> -
> +		sd->output_queue = NULL;
> +		sd->output_queue_tailp = &sd->output_queue;
>  #ifdef CONFIG_RPS
>  		sd->csd.func = rps_trigger_softirq;
>  		sd->csd.info = sd;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add ndo_set_vf_port_profile
From: Scott Feldman @ 2010-04-27 20:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Rose, Gregory V, David Miller, netdev@vger.kernel.org,
	chrisw@redhat.com, Williams, Mitch A
In-Reply-To: <201004271435.25480.arnd@arndb.de>

On 4/27/10 5:35 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Tuesday 27 April 2010, Scott Feldman wrote:
>>> Yes, I believe that's there today:
>>> 
>>>     NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
>>> 
>>> The number of VFs is returned in RTM_GETLINK.  But, it's only returned if:
>>> 
>>>     if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent)
>>> 
>>> For my proposal, I'll need to return IFLA_NUM_VF unconditionally so callers
>>> can get num VFs.
>> 
>> Hmmm...seems IFLA_NUM_VF assumes a PCI device supporting SR-IOV when it uses
>> dev_num_vf().  I think a better option would have been to query the device
>> for the number of VFs, without assuming SR-IOV or even PCI.
>> 
>> I see a ndo_get_num_vf() coming...
> 
> Shouldn't the number of registered port profiles be totally independent of
> the number of virtual functions?
> 
> Any of the VFs could multiplex multiple guests using macvlan, which means you
> need to register each guest separately, not each VF.
> 
> Anything that ties port profiles to VFs seems fundamentally flawed AFAICT,
> at least when we want to extend this to adapters that don't do it in firmware.

Ya, I tend I agree.  Let's just make port-profile a setting of any netdev,
an eth, macvtap, eth.x, bond, etc.  That's probably what I should have done
in the first place.  Something like:

       ip link set DEVICE [ { up | down } ]
                          [ arp { on | off } ]
                            <...clip...>
                          [ alias NAME ]
                          [ vf NUM [ mac LLADDR ]
                                   [ vlan VLANID [ qos VLAN-QOS ] ]
                                   [ rate TXRATE ] ]
                          [ port_profile [ PORT-PROFILE
                                   [ mac LLADDR ]
                                   [ host_uuid HOST_UUID ]
                                   [ client_uuid CLIENT_UUID ]
                                   [ client_name CLIENT_NAME ] ] ] ]
       ip link show [ DEVICE ]

I think I was trying to be too accommodating for models with VFs, but it
doesn't matter like you point out.

This way, I can get the RTM_GETLINK to return the port-profile in use.

New patches coming soon...

-scott


^ permalink raw reply

* Re: [patch] ipheth: potential null dereferences on error path
From: L. Alberto Giménez @ 2010-04-27 21:00 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Diego Giagio, David S. Miller, netdev, kernel-janitors
In-Reply-To: <20100427092012.GA29093@bicker>

On Tue, Apr 27, 2010 at 11:20:12AM +0200, Dan Carpenter wrote:
> The calls to usb_free_buffer() dereference rx_urb and tx_urb in the
> parameter list but those could be NULL.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Seems good to me (should I ack it or any other kind of singoff?).

-- 
L. Alberto Giménez
JabberID agimenez@jabber.sysvalve.es
GnuPG key ID 0x3BAABDE1

^ permalink raw reply

* powerpc gianfar driver does not work well when PREEMPT/PREEMPT_RT is enabled
From: Xianghua Xiao @ 2010-04-27 21:01 UTC (permalink / raw)
  To: netdev, linux-rt-users

I posted this to linuxppc list originally and hope someone here with
NAPI/COALESCE/RT experience can comment on...
-----------------------------
I'm trying to get 834x/TSEC gianfar.c working with 2.6.33/RT.

when PREEMPT is disabled gianfar driver worked well.

if PREEMPT is enabled, especially when PREEMPT_RT is enabled,
network(gianfar) will be disconnected in about 2-3 minutes under
iperf, if NFS  is used then the whole system will hang after a while
when NFS is accessed.

In an older version (2.6.18-rt) where NAPI is disabled, gianfar
performed well under PREEMPT_RT, in the new version of gianfar, NAPI
is enforced(the code is there by default and it's hard to disable NAPI
in the code now), also TX COALESCE is enabled while RX COALESCE is
disabled. It seems to me NAPI is now by default for Rx and COALESCE is
by default for Tx.

Both NAPI/COALESCE may have negative effects for real time systems,
where latency is more important than throughput. Unfortunately it's
hard to disable either of them now after some experiments.

Is there anyone here using gianfar with PREEMPT_RT? Do I have to port
an older version gianfar to get rid of NAPI at least?

Thanks,
Xianghua

^ permalink raw reply

* [PATCH 2/3] bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
From: Michael Chan @ 2010-04-27 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, jfeeney
In-Reply-To: <1272403691-2934-1-git-send-email-mchan@broadcom.com>

The bonding driver calls ndo_vlan_rx_register() while holding bond->lock.
The bnx2 driver calls bnx2_netif_stop() to stop the rx handling while
changing the vlgrp.  The call also stops the cnic driver which sleeps
while the bond->lock is held and cause the warning.

This code path only needs to stop the NAPI rx handling while we are
changing the vlgrp.  Since no reset is going to occur, there is no need
to stop cnic in this case.  By adding a parameter to bnx2_netif_stop()
to skip stopping cnic, we can avoid the warning.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |   38 ++++++++++++++++++++------------------
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4c1e51e..35eec2d 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -651,9 +651,10 @@ bnx2_napi_enable(struct bnx2 *bp)
 }
 
 static void
-bnx2_netif_stop(struct bnx2 *bp)
+bnx2_netif_stop(struct bnx2 *bp, bool stop_cnic)
 {
-	bnx2_cnic_stop(bp);
+	if (stop_cnic)
+		bnx2_cnic_stop(bp);
 	if (netif_running(bp->dev)) {
 		int i;
 
@@ -671,14 +672,15 @@ bnx2_netif_stop(struct bnx2 *bp)
 }
 
 static void
-bnx2_netif_start(struct bnx2 *bp)
+bnx2_netif_start(struct bnx2 *bp, bool start_cnic)
 {
 	if (atomic_dec_and_test(&bp->intr_sem)) {
 		if (netif_running(bp->dev)) {
 			netif_tx_wake_all_queues(bp->dev);
 			bnx2_napi_enable(bp);
 			bnx2_enable_int(bp);
-			bnx2_cnic_start(bp);
+			if (start_cnic)
+				bnx2_cnic_start(bp);
 		}
 	}
 }
@@ -6277,12 +6279,12 @@ bnx2_reset_task(struct work_struct *work)
 		return;
 	}
 
-	bnx2_netif_stop(bp);
+	bnx2_netif_stop(bp, true);
 
 	bnx2_init_nic(bp, 1);
 
 	atomic_set(&bp->intr_sem, 1);
-	bnx2_netif_start(bp);
+	bnx2_netif_start(bp, true);
 	rtnl_unlock();
 }
 
@@ -6324,7 +6326,7 @@ bnx2_vlan_rx_register(struct net_device *dev, struct vlan_group *vlgrp)
 	struct bnx2 *bp = netdev_priv(dev);
 
 	if (netif_running(dev))
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, false);
 
 	bp->vlgrp = vlgrp;
 
@@ -6335,7 +6337,7 @@ bnx2_vlan_rx_register(struct net_device *dev, struct vlan_group *vlgrp)
 	if (bp->flags & BNX2_FLAG_CAN_KEEP_VLAN)
 		bnx2_fw_sync(bp, BNX2_DRV_MSG_CODE_KEEP_VLAN_UPDATE, 0, 1);
 
-	bnx2_netif_start(bp);
+	bnx2_netif_start(bp, false);
 }
 #endif
 
@@ -7055,9 +7057,9 @@ bnx2_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal)
 	bp->stats_ticks &= BNX2_HC_STATS_TICKS_HC_STAT_TICKS;
 
 	if (netif_running(bp->dev)) {
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		bnx2_init_nic(bp, 0);
-		bnx2_netif_start(bp);
+		bnx2_netif_start(bp, true);
 	}
 
 	return 0;
@@ -7087,7 +7089,7 @@ bnx2_change_ring_size(struct bnx2 *bp, u32 rx, u32 tx)
 		/* Reset will erase chipset stats; save them */
 		bnx2_save_stats(bp);
 
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET);
 		bnx2_free_skbs(bp);
 		bnx2_free_mem(bp);
@@ -7115,7 +7117,7 @@ bnx2_change_ring_size(struct bnx2 *bp, u32 rx, u32 tx)
 			bnx2_setup_cnic_irq_info(bp);
 		mutex_unlock(&bp->cnic_lock);
 #endif
-		bnx2_netif_start(bp);
+		bnx2_netif_start(bp, true);
 	}
 	return 0;
 }
@@ -7368,7 +7370,7 @@ bnx2_self_test(struct net_device *dev, struct ethtool_test *etest, u64 *buf)
 	if (etest->flags & ETH_TEST_FL_OFFLINE) {
 		int i;
 
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_DIAG);
 		bnx2_free_skbs(bp);
 
@@ -7387,7 +7389,7 @@ bnx2_self_test(struct net_device *dev, struct ethtool_test *etest, u64 *buf)
 			bnx2_shutdown_chip(bp);
 		else {
 			bnx2_init_nic(bp, 1);
-			bnx2_netif_start(bp);
+			bnx2_netif_start(bp, true);
 		}
 
 		/* wait for link up */
@@ -8381,7 +8383,7 @@ bnx2_suspend(struct pci_dev *pdev, pm_message_t state)
 		return 0;
 
 	flush_scheduled_work();
-	bnx2_netif_stop(bp);
+	bnx2_netif_stop(bp, true);
 	netif_device_detach(dev);
 	del_timer_sync(&bp->timer);
 	bnx2_shutdown_chip(bp);
@@ -8403,7 +8405,7 @@ bnx2_resume(struct pci_dev *pdev)
 	bnx2_set_power_state(bp, PCI_D0);
 	netif_device_attach(dev);
 	bnx2_init_nic(bp, 1);
-	bnx2_netif_start(bp);
+	bnx2_netif_start(bp, true);
 	return 0;
 }
 
@@ -8430,7 +8432,7 @@ static pci_ers_result_t bnx2_io_error_detected(struct pci_dev *pdev,
 	}
 
 	if (netif_running(dev)) {
-		bnx2_netif_stop(bp);
+		bnx2_netif_stop(bp, true);
 		del_timer_sync(&bp->timer);
 		bnx2_reset_nic(bp, BNX2_DRV_MSG_CODE_RESET);
 	}
@@ -8487,7 +8489,7 @@ static void bnx2_io_resume(struct pci_dev *pdev)
 
 	rtnl_lock();
 	if (netif_running(dev))
-		bnx2_netif_start(bp);
+		bnx2_netif_start(bp, true);
 
 	netif_device_attach(dev);
 	rtnl_unlock();
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH 3/3] bnx2: Update version to 2.0.9.
From: Michael Chan @ 2010-04-27 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, jfeeney
In-Reply-To: <1272403691-2934-2-git-send-email-mchan@broadcom.com>

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 35eec2d..ac90a38 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -58,8 +58,8 @@
 #include "bnx2_fw.h"
 
 #define DRV_MODULE_NAME		"bnx2"
-#define DRV_MODULE_VERSION	"2.0.8"
-#define DRV_MODULE_RELDATE	"Feb 15, 2010"
+#define DRV_MODULE_VERSION	"2.0.9"
+#define DRV_MODULE_RELDATE	"April 27, 2010"
 #define FW_MIPS_FILE_06		"bnx2/bnx2-mips-06-5.0.0.j6.fw"
 #define FW_RV2P_FILE_06		"bnx2/bnx2-rv2p-06-5.0.0.j3.fw"
 #define FW_MIPS_FILE_09		"bnx2/bnx2-mips-09-5.0.0.j9.fw"
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH 1/3] bnx2: Fix lost MSI-X problem on 5709 NICs.
From: Michael Chan @ 2010-04-27 21:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, jfeeney

It has been reported that under certain heavy traffic conditions in MSI-X
mode, the driver can lose an MSI-X vector causing all packets in the
associated rx/tx ring pair to be dropped.  The problem is caused by
the chip dropping the write to unmask the MSI-X vector by the kernel
(when migrating the IRQ for example).

This can be prevented by increasing the GRC timeout value for these
register read and write operations.

Thanks to Dell for helping us debug this problem.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/bnx2.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index a257bab..4c1e51e 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -4759,8 +4759,12 @@ bnx2_reset_chip(struct bnx2 *bp, u32 reset_code)
 		rc = bnx2_alloc_bad_rbuf(bp);
 	}
 
-	if (bp->flags & BNX2_FLAG_USING_MSIX)
+	if (bp->flags & BNX2_FLAG_USING_MSIX) {
 		bnx2_setup_msix_tbl(bp);
+		/* Prevent MSIX table reads and write from timing out */
+		REG_WR(bp, BNX2_MISC_ECO_HW_CTL,
+			BNX2_MISC_ECO_HW_CTL_LARGE_GRC_TMOUT_EN);
+	}
 
 	return rc;
 }
-- 
1.6.4.GIT



^ permalink raw reply related

* [net-next-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: Jeff Kirsher @ 2010-04-27 21:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Don Skidmore, Jeff Kirsher

From: Don Skidmore <donald.c.skidmore@intel.com>

The way we were setting autoneg via ethtool was inconstant with that
of our other drivers.  It will change the following:

If autoneg is off:
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  off
RX:             off
TX:             off

Before:
>ethtool -A eth0 autoneg on
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  off
RX:             off
TX:             off

Now:
>ethtool -A eth0 autoneg on
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  on
RX:             on
TX:             on

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index 5f8c6ab..dfbfe35 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -365,7 +365,7 @@ static int ixgbe_set_pauseparam(struct net_device *netdev,
 	else
 		fc.disable_fc_autoneg = false;
 
-	if (pause->rx_pause && pause->tx_pause)
+	if ((pause->rx_pause && pause->tx_pause) || pause->autoneg)
 		fc.requested_mode = ixgbe_fc_full;
 	else if (pause->rx_pause && !pause->tx_pause)
 		fc.requested_mode = ixgbe_fc_rx_pause;


^ permalink raw reply related

* [net-next-2.6 PATCH] ixgbe: Properly display 1 gig downshift warning for backplane
From: Jeff Kirsher @ 2010-04-27 21:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Anjali Singhai, Jeff Kirsher

From: Anjali Singhai <anjali.singhai@intel.com>

Description: When using Intel smartspeed, the patch displays a
warning when the link down shifts to 1 Gig.

Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_82599.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c
index d189ba7..38c3840 100644
--- a/drivers/net/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ixgbe/ixgbe_82599.c
@@ -642,6 +642,7 @@ static s32 ixgbe_setup_mac_link_smartspeed(struct ixgbe_hw *hw,
 	s32 i, j;
 	bool link_up = false;
 	u32 autoc_reg = IXGBE_READ_REG(hw, IXGBE_AUTOC);
+	struct ixgbe_adapter *adapter = hw->back;
 
 	hw_dbg(hw, "ixgbe_setup_mac_link_smartspeed.\n");
 
@@ -726,6 +727,10 @@ static s32 ixgbe_setup_mac_link_smartspeed(struct ixgbe_hw *hw,
 					    autoneg_wait_to_complete);
 
 out:
+	if (link_up && (link_speed == IXGBE_LINK_SPEED_1GB_FULL))
+		netif_info(adapter, hw, adapter->netdev, "Smartspeed has"
+			" downgraded the link speed from the maximum"
+			" advertised\n");
 	return status;
 }
 


^ permalink raw reply related

* [net-next-2.6 PATCH] ixgbevf: Fix link speed display
From: Jeff Kirsher @ 2010-04-27 21:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Greg Rose, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

The ixgbevf driver would always report 10Gig speeds even when the link
speed is downshifted to 1Gig.  This patch fixes that problem.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbevf/defines.h |   12 +++++++-----
 drivers/net/ixgbevf/vf.c      |    3 ++-
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ixgbevf/defines.h b/drivers/net/ixgbevf/defines.h
index c44fdb0..ca2c81f 100644
--- a/drivers/net/ixgbevf/defines.h
+++ b/drivers/net/ixgbevf/defines.h
@@ -41,11 +41,13 @@ typedef u32 ixgbe_link_speed;
 #define IXGBE_LINK_SPEED_1GB_FULL       0x0020
 #define IXGBE_LINK_SPEED_10GB_FULL      0x0080
 
-#define IXGBE_CTRL_RST          0x04000000 /* Reset (SW) */
-#define IXGBE_RXDCTL_ENABLE     0x02000000 /* Enable specific Rx Queue */
-#define IXGBE_TXDCTL_ENABLE     0x02000000 /* Enable specific Tx Queue */
-#define IXGBE_LINKS_UP          0x40000000
-#define IXGBE_LINKS_SPEED       0x20000000
+#define IXGBE_CTRL_RST              0x04000000 /* Reset (SW) */
+#define IXGBE_RXDCTL_ENABLE         0x02000000 /* Enable specific Rx Queue */
+#define IXGBE_TXDCTL_ENABLE         0x02000000 /* Enable specific Tx Queue */
+#define IXGBE_LINKS_UP              0x40000000
+#define IXGBE_LINKS_SPEED_82599     0x30000000
+#define IXGBE_LINKS_SPEED_10G_82599 0x30000000
+#define IXGBE_LINKS_SPEED_1G_82599  0x20000000
 
 /* Number of Transmit and Receive Descriptors must be a multiple of 8 */
 #define IXGBE_REQ_TX_DESCRIPTOR_MULTIPLE  8
diff --git a/drivers/net/ixgbevf/vf.c b/drivers/net/ixgbevf/vf.c
index 852e9c4..f6f9299 100644
--- a/drivers/net/ixgbevf/vf.c
+++ b/drivers/net/ixgbevf/vf.c
@@ -359,7 +359,8 @@ static s32 ixgbevf_check_mac_link_vf(struct ixgbe_hw *hw,
 	else
 		*link_up = false;
 
-	if (links_reg & IXGBE_LINKS_SPEED)
+	if ((links_reg & IXGBE_LINKS_SPEED_82599) ==
+	    IXGBE_LINKS_SPEED_10G_82599)
 		*speed = IXGBE_LINK_SPEED_10GB_FULL;
 	else
 		*speed = IXGBE_LINK_SPEED_1GB_FULL;


^ permalink raw reply related

* Re: [PATCH v2] net: reimplement softnet_data.output_queue as a FIFO queue
From: David Miller @ 2010-04-27 21:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev
In-Reply-To: <1272400618.2343.13.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 27 Apr 2010 22:36:58 +0200

> Le mardi 27 avril 2010 à 17:06 +0800, Changli Gao a écrit :
>> reimplement softnet_data.output_queue as a FIFO queue.
>> 
>> reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
>> the qdiscs rescheduled.
>> 
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [patch] ipheth: potential null dereferences on error path
From: David Miller @ 2010-04-27 21:33 UTC (permalink / raw)
  To: agimenez; +Cc: error27, diego, netdev, kernel-janitors
In-Reply-To: <20100427210003.GA13873@bart.evergreen.loc>

From: L. Alberto Giménez <agimenez@sysvalve.es>
Date: Tue, 27 Apr 2010 23:00:03 +0200

> On Tue, Apr 27, 2010 at 11:20:12AM +0200, Dan Carpenter wrote:
>> The calls to usb_free_buffer() dereference rx_urb and tx_urb in the
>> parameter list but those could be NULL.
>> 
>> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> Seems good to me (should I ack it or any other kind of singoff?).

If you give it an "Acked-by: ..." that would be nice.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: David Miller @ 2010-04-27 21:36 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, donald.c.skidmore
In-Reply-To: <20100427213002.25913.93796.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 14:31:06 -0700

> From: Don Skidmore <donald.c.skidmore@intel.com>
> 
> The way we were setting autoneg via ethtool was inconstant with that
> of our other drivers.  It will change the following:
...
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbe: Properly display 1 gig downshift warning for backplane
From: David Miller @ 2010-04-27 21:36 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, anjali.singhai
In-Reply-To: <20100427213124.25913.82475.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 14:31:25 -0700

> From: Anjali Singhai <anjali.singhai@intel.com>
> 
> Description: When using Intel smartspeed, the patch displays a
> warning when the link down shifts to 1 Gig.
> 
> Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbevf: Fix link speed display
From: David Miller @ 2010-04-27 21:36 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, gregory.v.rose
In-Reply-To: <20100427213143.25913.83381.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Apr 2010 14:31:45 -0700

> From: Greg Rose <gregory.v.rose@intel.com>
> 
> The ixgbevf driver would always report 10Gig speeds even when the link
> speed is downshifted to 1Gig.  This patch fixes that problem.
> 
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox