Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Lorenzo Colitti @ 2010-10-26  5:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20101025213841.635b9a15@nehalam>

On Mon, Oct 25, 2010 at 9:38 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> This is incorrect. When link is lost, routes and address should not be
> flushed. They should be marked as tentative and then go through DAD again
> on the new network.

That won't help the case I am trying to fix, which is the case where
the new link has a global prefix different than the old link. Marking
the addresses as tentative will simply make them pass DAD and come
back as soon as link comes back. But since they don't match the prefix
that is assigned to the new link, they are unusable, because packets
can't be routed back to them.

> If you do it this way, you break routing protocols when link is brought
> down and back up.

The only addresses and routes flushed in this way should be ones that
aren't manually configured, i.e., the ones created by autoconf
(addrconf.c:2720 onwards). These won't be used by routing protocols,
except for link-local addresses. So I assume you're talking about
link-local here.

Link-local addresses are immediately recreated in a tentative state as
soon as link comes back, because on NETDEV_UP addrconf_notify calls
addrconf_dev_config. So, this patch only makes it so that they become
tentative when link goes away and comes back. In that time, the router
that temporarily loses link is unable to send packets for the brief
period of time that the link is performing DAD, but if the router has
lost link, it will also fail to send the packet while link is lost.
What's the additional failure scenario? Will it help if I make it so
that link-local addresses aren't touched at all?

^ permalink raw reply

* [PATCH] e1000e: add netpoll support for MSI/MSI-X IRQ modes
From: Dongdong Deng @ 2010-10-26  5:54 UTC (permalink / raw)
  To: davem, jesse, jeffrey.t.kirsher, bruce.w.allan
  Cc: alexander.h.duyck, carolyn.wyborny, donald.c.skidmore,
	gregory.v.rose, peter.p.waskiewicz.jr, john.ronciak,
	dongdong.deng, e1000-devel, netdev

With enabling CONFIG_PCI_MSI, e1000e could work in MSI/MSI-X IRQ mode,
and netpoll controller didn't deal with those IRQ modes on e1000e.

This patch add the handling MSI/MSI-X IRQ modes to netpoll controller,
so that netconsole could work with those IRQ modes.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
---
 drivers/net/e1000e/netdev.c |   49 +++++++++++++++++++++++++++++++++++++++---
 1 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index e561d15..36992ba 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -5369,6 +5369,36 @@ static void e1000_shutdown(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
+
+static irqreturn_t e1000_intr_msix(int irq, void *data)
+{
+	struct net_device *netdev = data;
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	int vector, msix_irq;
+
+	if (adapter->msix_entries) {
+		vector = 0;
+		msix_irq = adapter->msix_entries[vector].vector;
+		disable_irq(msix_irq);
+		e1000_intr_msix_rx(msix_irq, netdev);
+		enable_irq(msix_irq);
+
+		vector++;
+		msix_irq = adapter->msix_entries[vector].vector;
+		disable_irq(msix_irq);
+		e1000_intr_msix_tx(msix_irq, netdev);
+		enable_irq(msix_irq);
+
+		vector++;
+		msix_irq = adapter->msix_entries[vector].vector;
+		disable_irq(msix_irq);
+		e1000_msix_other(msix_irq, netdev);
+		enable_irq(msix_irq);
+	}
+
+	return IRQ_HANDLED;
+}
+
 /*
  * Polling 'interrupt' - used by things like netconsole to send skbs
  * without having to re-enable interrupts. It's not called while
@@ -5378,10 +5408,21 @@ static void e1000_netpoll(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
-	disable_irq(adapter->pdev->irq);
-	e1000_intr(adapter->pdev->irq, netdev);
-
-	enable_irq(adapter->pdev->irq);
+	switch (adapter->int_mode) {
+	case E1000E_INT_MODE_MSIX:
+		e1000_intr_msix(adapter->pdev->irq, netdev);
+		break;
+	case E1000E_INT_MODE_MSI:
+		disable_irq(adapter->pdev->irq);
+		e1000_intr_msi(adapter->pdev->irq, netdev);
+		enable_irq(adapter->pdev->irq);
+		break;
+	default: /* E1000E_INT_MODE_LEGACY */
+		disable_irq(adapter->pdev->irq);
+		e1000_intr(adapter->pdev->irq, netdev);
+		enable_irq(adapter->pdev->irq);
+		break;
+	}
 }
 #endif
 
-- 
1.6.0.4


^ permalink raw reply related

* Re: [PATCH 1/2 v3] xps: Improvements in TX queue selection
From: Helmut Schaa @ 2010-10-26  6:18 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, eric.dumazet
In-Reply-To: <alpine.DEB.1.00.1010211303390.30535@pokey.mtv.corp.google.com>

Hi,

Am Donnerstag 21 Oktober 2010 schrieb Tom Herbert:
> In dev_pick_tx, don't do work in calculating queue index or setting
> the index in the sock unless the device has more than one queue.  This
> allows the sock to be set only with a queue index of a multi-queue
> device which is desirable if device are stacked like in a tunnel.
> 
> We also allow the mapping of a socket to queue to be changed.  To
> maintain in order packet transmission a flag (ooo_okay) has been
> added to the sk_buff structure.  If a transport layer sets this flag
> on a packet, the transmit queue can be changed for the socket.
> Presumably, the transport would set this if there was no possbility
> of creating OOO packets (for instance, there are no packets in flight
> for the socket).  This patch includes the modification in TCP output
> for setting this flag.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---

[...]

> diff --git a/net/core/dev.c b/net/core/dev.c
> index b2269ac..a538ed5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2123,28 +2123,32 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
>  					struct sk_buff *skb)
>  {
>  	int queue_index;
> -	const struct net_device_ops *ops = dev->netdev_ops;
>  
> -	if (ops->ndo_select_queue) {
> -		queue_index = ops->ndo_select_queue(dev, skb);
> -		queue_index = dev_cap_txqueue(dev, queue_index);
> -	} else {
> +	if (dev->real_num_tx_queues > 1) {
>  		struct sock *sk = skb->sk;
> +
>  		queue_index = sk_tx_queue_get(sk);
> -		if (queue_index < 0) {
>  
> -			queue_index = 0;
> -			if (dev->real_num_tx_queues > 1)
> +		if (queue_index < 0 || skb->ooo_okay ||
> +		    queue_index >= dev->real_num_tx_queues) {
> +			const struct net_device_ops *ops = dev->netdev_ops;
> +			int old_index = queue_index;
> +
> +			if (ops->ndo_select_queue) {
> +				queue_index = ops->ndo_select_queue(dev, skb);
> +				queue_index = dev_cap_txqueue(dev, queue_index);
> +			} else
>  				queue_index = skb_tx_hash(dev, skb);

Wouldn't that break mac80211 QoS again for bridged AP mode interfaces (see
commit deabc772f39405054a438d711f408d2d94d26d96, "net: fix tx queue selection
for bridged devices implementing select_queue")?

Helmut

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/2] be2net: Adding an option to use INTx instead of MSI-X
From: Michael Ellerman @ 2010-10-26  6:52 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, somnath.kotur, netdev, linux-pci
In-Reply-To: <20101025.162535.226782713.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 5022 bytes --]

On Mon, 2010-10-25 at 16:25 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Mon, 25 Oct 2010 23:38:53 +0100
> 
> > David Miller wrote:
> >> From: Somnath Kotur <somnath.kotur@emulex.com>
> >> Date: Mon, 25 Oct 2010 16:42:35 +0530
> >> 
> >> > By default, be2net uses MSIx wherever possible.
> >> > Adding a module parameter to use INTx for users who do not want to use MSIx.
> >> > 
> >> > Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
> >> 
> >> Either add a new ethtool flag, or use the PCI subsystem facilities
> >> for tweaking things to implement this.
> >>
> >> Do not use a module option, otherwise every other networking driver
> >> author will get the same "cool" idea, give the module option
> >> different names, and the resulting user experience is terrible.
> > 
> > This has already happened, sadly.  So far as I can see it's mostly done
> > to allow users to work around systems with broken MSIs; I'm not aware of
> > any other reason to prefer legacy interrupts.  However, the PCI subsystem
> > already implements a blacklist and a kernel parameter for disabling MSIs
> > on these systems.
> 
> The PCI subsystem bits I'm totally fine with.
> 
> But in the drivers themselves, that's what I don't want.

That horse has really really bolted, it's gawn.

I count 26 drivers with "disable MSI/X" parameters. Some even have more
than one.

11 of them are network drivers, 9 scsi, 3 ata.

I agree it's a mess for users, but it's probably preferable to a
non-working driver.

Ethtool would be nice, but only for network drivers. Is there a generic
solution, quirks are obviously not keeping people happy.

cheers



MSI:

Param "msi", 10 users:
  drivers/ata/sata_mv.c:MODULE_PARM_DESC(msi, "Enable use of PCI MSI (0=off, 1=on)");
  drivers/ata/sata_nv.c:MODULE_PARM_DESC(msi, "Enable use of MSI (Default: false)");
  drivers/ata/sata_sil24.c:MODULE_PARM_DESC(msi, "Enable MSI (Default: false)");
  drivers/net/cxgb3/cxgb3_main.c:MODULE_PARM_DESC(msi, "whether to use MSI or MSI-X");
  drivers/net/cxgb4/cxgb4_main.c:MODULE_PARM_DESC(msi, "whether to use INTx (0), MSI (1) or MSI-X (2)");
  drivers/net/cxgb4vf/cxgb4vf_main.c:MODULE_PARM_DESC(msi, "whether to use MSI-X or MSI");
  drivers/net/forcedeth.c:MODULE_PARM_DESC(msi, "MSI interrupts are enabled by setting to 1 and disabled by setting to 0.");
  drivers/net/qla3xxx.c:MODULE_PARM_DESC(msi, "Turn on Message Signaled Interrupts.");
  drivers/scsi/aacraid/aachba.c:MODULE_PARM_DESC(msi, "IRQ handling."
  drivers/scsi/stex.c:MODULE_PARM_DESC(msi, "Enable Message Signaled Interrupts(0=off, 1=on)");

Param "use_msi", 4 users:
  drivers/net/qlcnic/qlcnic_main.c:MODULE_PARM_DESC(use_msi, "MSI interrupt (0=disabled, 1=enabled");
  drivers/scsi/3w-9xxx.c:MODULE_PARM_DESC(use_msi, "Use Message Signaled Interrupts.  Default: 0");
  drivers/scsi/3w-sas.c:MODULE_PARM_DESC(use_msi, "Use Message Signaled Interrupts. Default: 0");
  drivers/scsi/aic94xx/aic94xx_init.c:MODULE_PARM_DESC(use_msi, "\n"

Param "disable_msi", 4 users:
  drivers/scsi/vmw_pvscsi.c:MODULE_PARM_DESC(disable_msi, "Disable MSI use in driver - (default=0)");
  drivers/net/bnx2.c:MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
  drivers/net/chelsio/cxgb2.c:MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
  drivers/net/sky2.c:MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");

Misc:
  sound/pci/hda/hda_intel.c:MODULE_PARM_DESC(enable_msi, "Enable Message Signaled Interrupt (MSI)");
  drivers/message/fusion/mptbase.c:MODULE_PARM_DESC(mpt_msi_enable_spi, " Enable MSI Support for SPI \
  drivers/message/fusion/mptbase.c:MODULE_PARM_DESC(mpt_msi_enable_fc, " Enable MSI Support for FC \
  drivers/message/fusion/mptbase.c:MODULE_PARM_DESC(mpt_msi_enable_sas, " Enable MSI Support for SAS \
  drivers/net/myri10ge/myri10ge.c:MODULE_PARM_DESC(myri10ge_msi, "Enable Message Signalled Interrupts");


MSI-X:

Param "msi_x", 2 users:
  drivers/infiniband/hw/mthca/mthca_main.c:MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero");
  drivers/net/mlx4/main.c:MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero");

Misc:
  drivers/net/forcedeth.c:MODULE_PARM_DESC(msix, "MSIX interrupts are enabled by setting to 1 and disabled by setting to 0.");
  drivers/net/qlcnic/qlcnic_main.c:MODULE_PARM_DESC(use_msi_x, "MSI-X interrupt (0=disabled, 1=enabled");
  drivers/scsi/bfa/bfad_intr.c:MODULE_PARM_DESC(msix_disable_cb, "Disable MSIX for Brocade-415/425/815/825"
  drivers/scsi/bfa/bfad_intr.c:MODULE_PARM_DESC(msix_disable_ct, "Disable MSIX for Brocade-1010/1020/804"
  drivers/scsi/mpt2sas/mpt2sas_base.c:MODULE_PARM_DESC(msix_disable, " disable msix routed interrupts (default=0)");
  drivers/scsi/qla4xxx/ql4_os.c:MODULE_PARM_DESC(ql4xenablemsix,
  drivers/scsi/vmw_pvscsi.c:MODULE_PARM_DESC(disable_msix, "Disable MSI-X use in driver - (default=0)");


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [PATCH] tunnels: add __rcu annotations
From: Eric Dumazet @ 2010-10-26  7:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotations to :
        (struct ip_tunnel)->prl
        (struct ip_tunnel_prl_entry)->next
        (struct xfrm_tunnel)->next
	struct xfrm_tunnel *tunnel4_handlers
	struct xfrm_tunnel *tunnel64_handlers

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/ipip.h |    4 ++--
 include/net/xfrm.h |    2 +-
 net/ipv4/tunnel4.c |   29 +++++++++++++++++++----------
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/net/ipip.h b/include/net/ipip.h
index 0403fe4..a32654d 100644
--- a/include/net/ipip.h
+++ b/include/net/ipip.h
@@ -34,12 +34,12 @@ struct ip_tunnel {
 #ifdef CONFIG_IPV6_SIT_6RD
 	struct ip_tunnel_6rd_parm	ip6rd;
 #endif
-	struct ip_tunnel_prl_entry	*prl;		/* potential router list */
+	struct ip_tunnel_prl_entry __rcu *prl;		/* potential router list */
 	unsigned int			prl_count;	/* # of entries in PRL */
 };
 
 struct ip_tunnel_prl_entry {
-	struct ip_tunnel_prl_entry	*next;
+	struct ip_tunnel_prl_entry __rcu *next;
 	__be32				addr;
 	u16				flags;
 	struct rcu_head			rcu_head;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index ffcd478..bcfb6b2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1264,7 +1264,7 @@ struct xfrm_tunnel {
 	int (*handler)(struct sk_buff *skb);
 	int (*err_handler)(struct sk_buff *skb, u32 info);
 
-	struct xfrm_tunnel *next;
+	struct xfrm_tunnel __rcu *next;
 	int priority;
 };
 
diff --git a/net/ipv4/tunnel4.c b/net/ipv4/tunnel4.c
index 9a17bd2..ac3b3ee 100644
--- a/net/ipv4/tunnel4.c
+++ b/net/ipv4/tunnel4.c
@@ -14,27 +14,32 @@
 #include <net/protocol.h>
 #include <net/xfrm.h>
 
-static struct xfrm_tunnel *tunnel4_handlers __read_mostly;
-static struct xfrm_tunnel *tunnel64_handlers __read_mostly;
+static struct xfrm_tunnel __rcu *tunnel4_handlers __read_mostly;
+static struct xfrm_tunnel __rcu *tunnel64_handlers __read_mostly;
 static DEFINE_MUTEX(tunnel4_mutex);
 
-static inline struct xfrm_tunnel **fam_handlers(unsigned short family)
+static inline struct xfrm_tunnel __rcu **fam_handlers(unsigned short family)
 {
 	return (family == AF_INET) ? &tunnel4_handlers : &tunnel64_handlers;
 }
 
 int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family)
 {
-	struct xfrm_tunnel **pprev;
+	struct xfrm_tunnel __rcu **pprev;
+	struct xfrm_tunnel *t;
+
 	int ret = -EEXIST;
 	int priority = handler->priority;
 
 	mutex_lock(&tunnel4_mutex);
 
-	for (pprev = fam_handlers(family); *pprev; pprev = &(*pprev)->next) {
-		if ((*pprev)->priority > priority)
+	for (pprev = fam_handlers(family);
+	     (t = rcu_dereference_protected(*pprev,
+			lockdep_is_held(&tunnel4_mutex))) != NULL;
+	     pprev = &t->next) {
+		if (t->priority > priority)
 			break;
-		if ((*pprev)->priority == priority)
+		if (t->priority == priority)
 			goto err;
 	}
 
@@ -52,13 +57,17 @@ EXPORT_SYMBOL(xfrm4_tunnel_register);
 
 int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family)
 {
-	struct xfrm_tunnel **pprev;
+	struct xfrm_tunnel __rcu **pprev;
+	struct xfrm_tunnel *t;
 	int ret = -ENOENT;
 
 	mutex_lock(&tunnel4_mutex);
 
-	for (pprev = fam_handlers(family); *pprev; pprev = &(*pprev)->next) {
-		if (*pprev == handler) {
+	for (pprev = fam_handlers(family);
+	     (t = rcu_dereference_protected(*pprev,
+			lockdep_is_held(&tunnel4_mutex))) != NULL;
+	     pprev = &t->next) {
+		if (t == handler) {
 			*pprev = handler->next;
 			ret = 0;
 			break;



^ permalink raw reply related

* [PATCH] ipv4: add __rcu annotations to routes.c
From: Eric Dumazet @ 2010-10-26  7:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotations to :
        (struct dst_entry)->rt_next
        (struct rt_hash_bucket)->chain

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/dst.h |    2 -
 net/ipv4/route.c  |   75 ++++++++++++++++++++++++++------------------
 2 files changed, 47 insertions(+), 30 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index a217c83..ffe9cb7 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -95,7 +95,7 @@ struct dst_entry {
 	unsigned long		lastuse;
 	union {
 		struct dst_entry *next;
-		struct rtable    *rt_next;
+		struct rtable __rcu *rt_next;
 		struct rt6_info   *rt6_next;
 		struct dn_route  *dn_next;
 	};
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d6cb2bf..987bf9a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -198,7 +198,7 @@ const __u8 ip_tos2prio[16] = {
  */
 
 struct rt_hash_bucket {
-	struct rtable	*chain;
+	struct rtable __rcu	*chain;
 };
 
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \
@@ -280,7 +280,7 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 	struct rtable *r = NULL;
 
 	for (st->bucket = rt_hash_mask; st->bucket >= 0; --st->bucket) {
-		if (!rt_hash_table[st->bucket].chain)
+		if (!rcu_dereference_raw(rt_hash_table[st->bucket].chain))
 			continue;
 		rcu_read_lock_bh();
 		r = rcu_dereference_bh(rt_hash_table[st->bucket].chain);
@@ -300,17 +300,17 @@ static struct rtable *__rt_cache_get_next(struct seq_file *seq,
 {
 	struct rt_cache_iter_state *st = seq->private;
 
-	r = r->dst.rt_next;
+	r = rcu_dereference_bh(r->dst.rt_next);
 	while (!r) {
 		rcu_read_unlock_bh();
 		do {
 			if (--st->bucket < 0)
 				return NULL;
-		} while (!rt_hash_table[st->bucket].chain);
+		} while (!rcu_dereference_raw(rt_hash_table[st->bucket].chain));
 		rcu_read_lock_bh();
-		r = rt_hash_table[st->bucket].chain;
+		r = rcu_dereference_bh(rt_hash_table[st->bucket].chain);
 	}
-	return rcu_dereference_bh(r);
+	return r;
 }
 
 static struct rtable *rt_cache_get_next(struct seq_file *seq,
@@ -721,19 +721,23 @@ static void rt_do_flush(int process_context)
 	for (i = 0; i <= rt_hash_mask; i++) {
 		if (process_context && need_resched())
 			cond_resched();
-		rth = rt_hash_table[i].chain;
+		rth = rcu_dereference_raw(rt_hash_table[i].chain);
 		if (!rth)
 			continue;
 
 		spin_lock_bh(rt_hash_lock_addr(i));
 #ifdef CONFIG_NET_NS
 		{
-		struct rtable ** prev, * p;
+		struct rtable __rcu **prev;
+		struct rtable *p;
 
-		rth = rt_hash_table[i].chain;
+		rth = rcu_dereference_protected(rt_hash_table[i].chain,
+			lockdep_is_held(rt_hash_lock_addr(i)));
 
 		/* defer releasing the head of the list after spin_unlock */
-		for (tail = rth; tail; tail = tail->dst.rt_next)
+		for (tail = rth; tail;
+		     tail = rcu_dereference_protected(tail->dst.rt_next,
+				lockdep_is_held(rt_hash_lock_addr(i))))
 			if (!rt_is_expired(tail))
 				break;
 		if (rth != tail)
@@ -741,8 +745,12 @@ static void rt_do_flush(int process_context)
 
 		/* call rt_free on entries after the tail requiring flush */
 		prev = &rt_hash_table[i].chain;
-		for (p = *prev; p; p = next) {
-			next = p->dst.rt_next;
+		for (p = rcu_dereference_protected(*prev,
+				lockdep_is_held(rt_hash_lock_addr(i)));
+		     p != NULL;
+		     p = next) {
+			next = rcu_dereference_protected(p->dst.rt_next,
+				lockdep_is_held(rt_hash_lock_addr(i)));
 			if (!rt_is_expired(p)) {
 				prev = &p->dst.rt_next;
 			} else {
@@ -752,14 +760,15 @@ static void rt_do_flush(int process_context)
 		}
 		}
 #else
-		rth = rt_hash_table[i].chain;
-		rt_hash_table[i].chain = NULL;
+		rth = rcu_dereference_protected(rt_hash_table[i].chain,
+			lockdep_is_held(rt_hash_lock_addr(i)));
+		rcu_assign_pointer(rt_hash_table[i].chain, NULL);
 		tail = NULL;
 #endif
 		spin_unlock_bh(rt_hash_lock_addr(i));
 
 		for (; rth != tail; rth = next) {
-			next = rth->dst.rt_next;
+			next = rcu_dereference_protected(rth->dst.rt_next, 1);
 			rt_free(rth);
 		}
 	}
@@ -790,7 +799,7 @@ static int has_noalias(const struct rtable *head, const struct rtable *rth)
 	while (aux != rth) {
 		if (compare_hash_inputs(&aux->fl, &rth->fl))
 			return 0;
-		aux = aux->dst.rt_next;
+		aux = rcu_dereference_protected(aux->dst.rt_next, 1);
 	}
 	return ONE;
 }
@@ -799,7 +808,8 @@ static void rt_check_expire(void)
 {
 	static unsigned int rover;
 	unsigned int i = rover, goal;
-	struct rtable *rth, **rthp;
+	struct rtable *rth;
+	struct rtable __rcu **rthp;
 	unsigned long samples = 0;
 	unsigned long sum = 0, sum2 = 0;
 	unsigned long delta;
@@ -825,11 +835,12 @@ static void rt_check_expire(void)
 
 		samples++;
 
-		if (*rthp == NULL)
+		if (rcu_dereference_raw(*rthp) == NULL)
 			continue;
 		length = 0;
 		spin_lock_bh(rt_hash_lock_addr(i));
-		while ((rth = *rthp) != NULL) {
+		while ((rth = rcu_dereference_protected(*rthp,
+					lockdep_is_held(rt_hash_lock_addr(i)))) != NULL) {
 			prefetch(rth->dst.rt_next);
 			if (rt_is_expired(rth)) {
 				*rthp = rth->dst.rt_next;
@@ -941,7 +952,8 @@ static int rt_garbage_collect(struct dst_ops *ops)
 	static unsigned long last_gc;
 	static int rover;
 	static int equilibrium;
-	struct rtable *rth, **rthp;
+	struct rtable *rth;
+	struct rtable __rcu **rthp;
 	unsigned long now = jiffies;
 	int goal;
 	int entries = dst_entries_get_fast(&ipv4_dst_ops);
@@ -995,7 +1007,8 @@ static int rt_garbage_collect(struct dst_ops *ops)
 			k = (k + 1) & rt_hash_mask;
 			rthp = &rt_hash_table[k].chain;
 			spin_lock_bh(rt_hash_lock_addr(k));
-			while ((rth = *rthp) != NULL) {
+			while ((rth = rcu_dereference_protected(*rthp,
+					lockdep_is_held(rt_hash_lock_addr(k)))) != NULL) {
 				if (!rt_is_expired(rth) &&
 					!rt_may_expire(rth, tmo, expire)) {
 					tmo >>= 1;
@@ -1071,7 +1084,7 @@ static int slow_chain_length(const struct rtable *head)
 
 	while (rth) {
 		length += has_noalias(head, rth);
-		rth = rth->dst.rt_next;
+		rth = rcu_dereference_protected(rth->dst.rt_next, 1);
 	}
 	return length >> FRACT_BITS;
 }
@@ -1079,9 +1092,9 @@ static int slow_chain_length(const struct rtable *head)
 static int rt_intern_hash(unsigned hash, struct rtable *rt,
 			  struct rtable **rp, struct sk_buff *skb, int ifindex)
 {
-	struct rtable	*rth, **rthp;
+	struct rtable	*rth, *cand;
+	struct rtable __rcu **rthp, **candp;
 	unsigned long	now;
-	struct rtable *cand, **candp;
 	u32 		min_score;
 	int		chain_length;
 	int attempts = !in_softirq();
@@ -1128,7 +1141,8 @@ restart:
 	rthp = &rt_hash_table[hash].chain;
 
 	spin_lock_bh(rt_hash_lock_addr(hash));
-	while ((rth = *rthp) != NULL) {
+	while ((rth = rcu_dereference_protected(*rthp,
+			lockdep_is_held(rt_hash_lock_addr(hash)))) != NULL) {
 		if (rt_is_expired(rth)) {
 			*rthp = rth->dst.rt_next;
 			rt_free(rth);
@@ -1324,12 +1338,14 @@ EXPORT_SYMBOL(__ip_select_ident);
 
 static void rt_del(unsigned hash, struct rtable *rt)
 {
-	struct rtable **rthp, *aux;
+	struct rtable __rcu **rthp;
+	struct rtable *aux;
 
 	rthp = &rt_hash_table[hash].chain;
 	spin_lock_bh(rt_hash_lock_addr(hash));
 	ip_rt_put(rt);
-	while ((aux = *rthp) != NULL) {
+	while ((aux = rcu_dereference_protected(*rthp,
+			lockdep_is_held(rt_hash_lock_addr(hash)))) != NULL) {
 		if (aux == rt || rt_is_expired(aux)) {
 			*rthp = aux->dst.rt_next;
 			rt_free(aux);
@@ -1346,7 +1362,8 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 {
 	int i, k;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
-	struct rtable *rth, **rthp;
+	struct rtable *rth;
+	struct rtable __rcu **rthp;
 	__be32  skeys[2] = { saddr, 0 };
 	int  ikeys[2] = { dev->ifindex, 0 };
 	struct netevent_redirect netevent;
@@ -1379,7 +1396,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 			unsigned hash = rt_hash(daddr, skeys[i], ikeys[k],
 						rt_genid(net));
 
-			rthp=&rt_hash_table[hash].chain;
+			rthp = &rt_hash_table[hash].chain;
 
 			while ((rth = rcu_dereference(*rthp)) != NULL) {
 				struct rtable *rt;



^ permalink raw reply related

* [PATCH] net: add __rcu annotations to protocol
From: Eric Dumazet @ 2010-10-26  7:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotations to :
        struct net_protocol *inet_protos
        struct net_protocol *inet6_protos

And use appropriate casts to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/protocol.h |    4 ++--
 net/ipv4/protocol.c    |    8 +++++---
 net/ipv6/protocol.c    |    8 +++++---
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index f1effdd..dc07495 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -89,10 +89,10 @@ struct inet_protosw {
 #define INET_PROTOSW_PERMANENT 0x02  /* Permanent protocols are unremovable. */
 #define INET_PROTOSW_ICSK      0x04  /* Is this an inet_connection_sock? */
 
-extern const struct net_protocol *inet_protos[MAX_INET_PROTOS];
+extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
 
 #if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
-extern const struct inet6_protocol *inet6_protos[MAX_INET_PROTOS];
+extern const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
 #endif
 
 extern int	inet_add_protocol(const struct net_protocol *prot, unsigned char num);
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 65699c2..9ae5c01 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -28,7 +28,7 @@
 #include <linux/spinlock.h>
 #include <net/protocol.h>
 
-const struct net_protocol *inet_protos[MAX_INET_PROTOS] __read_mostly;
+const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
 
 /*
  *	Add a protocol handler to the hash tables
@@ -38,7 +38,8 @@ int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
 {
 	int hash = protocol & (MAX_INET_PROTOS - 1);
 
-	return !cmpxchg(&inet_protos[hash], NULL, prot) ? 0 : -1;
+	return !cmpxchg((const struct net_protocol **)&inet_protos[hash],
+			NULL, prot) ? 0 : -1;
 }
 EXPORT_SYMBOL(inet_add_protocol);
 
@@ -50,7 +51,8 @@ int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol)
 {
 	int ret, hash = protocol & (MAX_INET_PROTOS - 1);
 
-	ret = (cmpxchg(&inet_protos[hash], prot, NULL) == prot) ? 0 : -1;
+	ret = (cmpxchg((const struct net_protocol **)&inet_protos[hash],
+		       prot, NULL) == prot) ? 0 : -1;
 
 	synchronize_net();
 
diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c
index 9bb936a..9a7978f 100644
--- a/net/ipv6/protocol.c
+++ b/net/ipv6/protocol.c
@@ -25,13 +25,14 @@
 #include <linux/spinlock.h>
 #include <net/protocol.h>
 
-const struct inet6_protocol *inet6_protos[MAX_INET_PROTOS] __read_mostly;
+const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS] __read_mostly;
 
 int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char protocol)
 {
 	int hash = protocol & (MAX_INET_PROTOS - 1);
 
-	return !cmpxchg(&inet6_protos[hash], NULL, prot) ? 0 : -1;
+	return !cmpxchg((const struct inet6_protocol **)&inet6_protos[hash],
+			NULL, prot) ? 0 : -1;
 }
 EXPORT_SYMBOL(inet6_add_protocol);
 
@@ -43,7 +44,8 @@ int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char protocol
 {
 	int ret, hash = protocol & (MAX_INET_PROTOS - 1);
 
-	ret = (cmpxchg(&inet6_protos[hash], prot, NULL) == prot) ? 0 : -1;
+	ret = (cmpxchg((const struct inet6_protocol **)&inet6_protos[hash],
+		       prot, NULL) == prot) ? 0 : -1;
 
 	synchronize_net();
 



^ permalink raw reply related

* Re: ath9k crashing the kernel
From: Jaswinder Singh @ 2010-10-26  7:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-wireless, netdev, ath9k-devel,
	Linus 
In-Reply-To: <AANLkTim=jD-C4DwDS2=KuQnhPOyP+Cnd-hUTdaZ7Opth@mail.gmail.com>

Hello,

On Tue, Oct 26, 2010 at 10:36 AM, Jaswinder Singh
<jaswinderlinux@gmail.com> wrote:
>
> ath9k is crashing the kernel :
>
> [   21.276554] BUG: spinlock bad magic on CPU#1, NetworkManager/1056
> [   21.277015]  lock: f5be80a8, .magic: 00000000, .owner: <none>/-1,
> .owner_cpu: 0
> [   21.277015] Pid: 1056, comm: NetworkManager Not tainted 2.6.36-netbook+ #20
> [   21.277015] Call Trace:
> [   21.277015]  [<c14767a7>] ? printk+0xf/0x11
> [   21.277015]  [<c117b823>] spin_bug+0x7c/0x87
> [   21.301365]  [<c117b8bd>] do_raw_spin_lock+0x1e/0x125
> [   21.301365]  [<c1478d0a>] ? _raw_spin_unlock_bh+0x1a/0x1c
> [   21.301365]  [<c1478dc3>] _raw_spin_lock_irqsave+0x17/0x1c
> [   21.318857]  [<c1288a74>] ath9k_config+0x255/0x38b
> [   21.318857]  [<c1447bdb>] ieee80211_hw_config+0x10a/0x114
> [   21.328034]  [<c1453545>] ieee80211_do_open+0x3de/0x4cf
> [   21.328034]  [<c1452206>] ? ieee80211_check_concurrent_iface+0x21/0x13a
> [   21.328034]  [<c104c470>] ? raw_notifier_call_chain+0xc/0xe
> [   21.328034]  [<c1453691>] ieee80211_open+0x5b/0x5e
> [   21.328034]  [<c13947cf>] __dev_open+0x80/0xa9
> [   21.328034]  [<c13920bb>] __dev_change_flags+0xa1/0x116
> [   21.328034]  [<c1394723>] dev_change_flags+0x13/0x3f
> [   21.328034]  [<c139d568>] do_setlink+0x226/0x507
> [   21.328034]  [<c139d917>] rtnl_setlink+0xce/0xd4
> [   21.328034]  [<c11786d2>] ? copy_to_user+0x3a/0x118
> [   21.328034]  [<c139d849>] ? rtnl_setlink+0x0/0xd4
> [   21.328034]  [<c139df4d>] rtnetlink_rcv_msg+0x17e/0x194
> [   21.328034]  [<c139ddcf>] ? rtnetlink_rcv_msg+0x0/0x194
> [   21.328034]  [<c13a59c4>] netlink_rcv_skb+0x30/0x76
> [   21.328034]  [<c139ddc8>] rtnetlink_rcv+0x1b/0x22
> [   21.328034]  [<c13a5772>] netlink_unicast+0x1aa/0x20b
> [   21.328034]  [<c13a5eac>] netlink_sendmsg+0x22c/0x27a
> [   21.328034]  [<c13862b8>] sock_sendmsg+0xa5/0xbb
> [   21.328034]  [<c13862b8>] ? sock_sendmsg+0xa5/0xbb
> [   21.328034]  [<c138e05b>] ? verify_iovec+0x3e/0x6b
> [   21.328034]  [<c1386ab2>] sys_sendmsg+0x149/0x196
> [   21.328034]  [<c10826f5>] ? unlock_page+0x40/0x43
> [   21.328034]  [<c10952d5>] ? __do_fault+0x367/0x393
> [   21.328034]  [<c1096b8e>] ? handle_mm_fault+0x3bd/0x77a
> [   21.328034]  [<c10b6fb3>] ? destroy_inode+0x1f/0x30
> [   21.328034]  [<c138694e>] ? sys_recvmsg+0x2b/0x46
> [   21.328034]  [<c1387c00>] sys_socketcall+0x146/0x18b
> [   21.328034]  [<c1002850>] sysenter_do_call+0x12/0x26
> [   21.344846] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [   21.487994] atl1c 0000:03:00.0: irq 42 for MSI/MSI-X
> [   21.543870] ADDRCONF(NETDEV_UP): eth0: link is not ready
>
>
> Linux 2.6.36 f6f94e2ab1 is good
> and
> 229aebb873e2972 is bad
>

After further investigation bad commit is :

3430098ae463e31ab16926ac3eb295368a3ca5d9 is the first bad commit
commit 3430098ae463e31ab16926ac3eb295368a3ca5d9
Author: Felix Fietkau <nbd@openwrt.org>
Date:   Sun Oct 10 18:21:52 2010 +0200

    ath9k: implement channel utilization stats for survey

    Results for the active channel are updated whenever a new survey dump
    is requested, the old data is kept to allow multiple processes to
    make their own channel utilization averages.
    All other channels only contain the data for the last time that the
    hardware was on the channel, i.e. the last scan result or other
    off-channel activity.
    Running a background scan does not clear the data for the active
    channel.

    Signed-off-by: Felix Fietkau <nbd@openwrt.org>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>


Thanks,
--
Jaswinder Singh.
> I am trying to bisect it but I am getting compilation error :
>
>  LD      .tmp_vmlinux1
> drivers/built-in.o: In function `ath_do_set_opmode':
> /home/jaswinder/jaswinder-git/linux-2.6/drivers/net/wireless/ath/ath5k/base.c:567:
> undefined reference to `ath_opmode_to_string'
> /home/jaswinder/jaswinder-git/linux-2.6/drivers/net/wireless/ath/ath5k/base.c:568:
> undefined reference to `ath_opmode_to_string'
> make: *** [.tmp_vmlinux1] Error 1
> [jaswinder@ linux-2.6]$
>

^ permalink raw reply

* [PATCH 1/5] netdev: bfin_mac: clean up printk messages
From: Mike Frysinger @ 2010-10-26  7:40 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel

Use netdev_* and pr_* helper funcs for output rather than printk.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   65 +++++++++++++++++++++--------------------------
 1 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index ce1e5e9..a572bcb 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -8,6 +8,11 @@
  * Licensed under the GPL-2 or later.
  */
 
+#define DRV_VERSION	"1.1"
+#define DRV_DESC	"Blackfin on-chip Ethernet MAC driver"
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/kernel.h>
@@ -41,12 +46,7 @@
 
 #include "bfin_mac.h"
 
-#define DRV_NAME	"bfin_mac"
-#define DRV_VERSION	"1.1"
-#define DRV_AUTHOR	"Bryan Wu, Luke Yang"
-#define DRV_DESC	"Blackfin on-chip Ethernet MAC driver"
-
-MODULE_AUTHOR(DRV_AUTHOR);
+MODULE_AUTHOR("Bryan Wu, Luke Yang");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION(DRV_DESC);
 MODULE_ALIAS("platform:bfin_mac");
@@ -189,8 +189,7 @@ static int desc_list_init(void)
 		/* allocate a new skb for next time receive */
 		new_skb = dev_alloc_skb(PKT_BUF_SZ + NET_IP_ALIGN);
 		if (!new_skb) {
-			printk(KERN_NOTICE DRV_NAME
-			       ": init: low on mem - packet dropped\n");
+			pr_notice("init: low on mem - packet dropped\n");
 			goto init_error;
 		}
 		skb_reserve(new_skb, NET_IP_ALIGN);
@@ -240,7 +239,7 @@ static int desc_list_init(void)
 
 init_error:
 	desc_list_free();
-	printk(KERN_ERR DRV_NAME ": kmalloc failed\n");
+	pr_err("kmalloc failed\n");
 	return -ENOMEM;
 }
 
@@ -259,8 +258,7 @@ static int bfin_mdio_poll(void)
 	while ((bfin_read_EMAC_STAADD()) & STABUSY) {
 		udelay(1);
 		if (timeout_cnt-- < 0) {
-			printk(KERN_ERR DRV_NAME
-			": wait MDC/MDIO transaction to complete timeout\n");
+			pr_err("wait MDC/MDIO transaction to complete timeout\n");
 			return -ETIMEDOUT;
 		}
 	}
@@ -350,9 +348,9 @@ static void bfin_mac_adjust_link(struct net_device *dev)
 					opmode &= ~RMII_10;
 					break;
 				default:
-					printk(KERN_WARNING
-						"%s: Ack!  Speed (%d) is not 10/100!\n",
-						DRV_NAME, phydev->speed);
+					netdev_warn(dev,
+						"Ack! Speed (%d) is not 10/100!\n",
+						phydev->speed);
 					break;
 				}
 				bfin_write_EMAC_OPMODE(opmode);
@@ -417,14 +415,13 @@ static int mii_probe(struct net_device *dev, int phy_mode)
 
 	/* now we are supposed to have a proper phydev, to attach to... */
 	if (!phydev) {
-		printk(KERN_INFO "%s: Don't found any phy device at all\n",
-			dev->name);
+		netdev_err(dev, "no phy device found\n");
 		return -ENODEV;
 	}
 
 	if (phy_mode != PHY_INTERFACE_MODE_RMII &&
 		phy_mode != PHY_INTERFACE_MODE_MII) {
-		printk(KERN_INFO "%s: Invalid phy interface mode\n", dev->name);
+		netdev_err(dev, "invalid phy interface mode\n");
 		return -EINVAL;
 	}
 
@@ -432,7 +429,7 @@ static int mii_probe(struct net_device *dev, int phy_mode)
 			0, phy_mode);
 
 	if (IS_ERR(phydev)) {
-		printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
+		netdev_err(dev, "could not attach PHY\n");
 		return PTR_ERR(phydev);
 	}
 
@@ -453,11 +450,10 @@ static int mii_probe(struct net_device *dev, int phy_mode)
 	lp->old_duplex = -1;
 	lp->phydev = phydev;
 
-	printk(KERN_INFO "%s: attached PHY driver [%s] "
-	       "(mii_bus:phy_addr=%s, irq=%d, mdc_clk=%dHz(mdc_div=%d)"
-	       "@sclk=%dMHz)\n",
-	       DRV_NAME, phydev->drv->name, dev_name(&phydev->dev), phydev->irq,
-	       MDC_CLK, mdc_div, sclk/1000000);
+	pr_info("attached PHY driver [%s] "
+	        "(mii_bus:phy_addr=%s, irq=%d, mdc_clk=%dHz(mdc_div=%d)@sclk=%dMHz)\n",
+	        phydev->drv->name, dev_name(&phydev->dev), phydev->irq,
+	        MDC_CLK, mdc_div, sclk/1000000);
 
 	return 0;
 }
@@ -502,7 +498,7 @@ bfin_mac_ethtool_setsettings(struct net_device *dev, struct ethtool_cmd *cmd)
 static void bfin_mac_ethtool_getdrvinfo(struct net_device *dev,
 					struct ethtool_drvinfo *info)
 {
-	strcpy(info->driver, DRV_NAME);
+	strcpy(info->driver, KBUILD_MODNAME);
 	strcpy(info->version, DRV_VERSION);
 	strcpy(info->fw_version, "N/A");
 	strcpy(info->bus_info, dev_name(&dev->dev));
@@ -827,8 +823,7 @@ static void bfin_tx_hwtstamp(struct net_device *netdev, struct sk_buff *skb)
 		while ((!(bfin_read_EMAC_PTP_ISTAT() & TXTL)) && (--timeout_cnt))
 			udelay(1);
 		if (timeout_cnt == 0)
-			printk(KERN_ERR DRV_NAME
-					": fails to timestamp the TX packet\n");
+			netdev_err(netdev, "timestamp the TX packet failed\n");
 		else {
 			struct skb_shared_hwtstamps shhwtstamps;
 			u64 ns;
@@ -1083,8 +1078,7 @@ static void bfin_mac_rx(struct net_device *dev)
 	 * we which case we simply drop the packet
 	 */
 	if (current_rx_ptr->status.status_word & RX_ERROR_MASK) {
-		printk(KERN_NOTICE DRV_NAME
-		       ": rx: receive error - packet dropped\n");
+		netdev_notice(dev, "rx: receive error - packet dropped\n");
 		dev->stats.rx_dropped++;
 		goto out;
 	}
@@ -1094,8 +1088,7 @@ static void bfin_mac_rx(struct net_device *dev)
 
 	new_skb = dev_alloc_skb(PKT_BUF_SZ + NET_IP_ALIGN);
 	if (!new_skb) {
-		printk(KERN_NOTICE DRV_NAME
-		       ": rx: low on mem - packet dropped\n");
+		netdev_notice(dev, "rx: low on mem - packet dropped\n");
 		dev->stats.rx_dropped++;
 		goto out;
 	}
@@ -1213,7 +1206,7 @@ static int bfin_mac_enable(struct phy_device *phydev)
 	int ret;
 	u32 opmode;
 
-	pr_debug("%s: %s\n", DRV_NAME, __func__);
+	pr_debug("%s\n", __func__);
 
 	/* Set RX DMA */
 	bfin_write_DMA1_NEXT_DESC_PTR(&(rx_list_head->desc_a));
@@ -1323,7 +1316,7 @@ static void bfin_mac_set_multicast_list(struct net_device *dev)
 	u32 sysctl;
 
 	if (dev->flags & IFF_PROMISC) {
-		printk(KERN_INFO "%s: set to promisc mode\n", dev->name);
+		netdev_info(dev, "set promisc mode\n");
 		sysctl = bfin_read_EMAC_OPMODE();
 		sysctl |= PR;
 		bfin_write_EMAC_OPMODE(sysctl);
@@ -1393,7 +1386,7 @@ static int bfin_mac_open(struct net_device *dev)
 	 * address using ifconfig eth0 hw ether xx:xx:xx:xx:xx:xx
 	 */
 	if (!is_valid_ether_addr(dev->dev_addr)) {
-		printk(KERN_WARNING DRV_NAME ": no valid ethernet hw addr\n");
+		netdev_warn(dev, "no valid ethernet hw addr\n");
 		return -EINVAL;
 	}
 
@@ -1558,7 +1551,7 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	bfin_mac_hwtstamp_init(ndev);
 
 	/* now, print out the card info, in a short format.. */
-	dev_info(&pdev->dev, "%s, Version %s\n", DRV_DESC, DRV_VERSION);
+	netdev_info(ndev, "%s, Version %s\n", DRV_DESC, DRV_VERSION);
 
 	return 0;
 
@@ -1650,7 +1643,7 @@ static int __devinit bfin_mii_bus_probe(struct platform_device *pdev)
 	 * so set the GPIO pins to Ethernet mode
 	 */
 	pin_req = mii_bus_pd->mac_peripherals;
-	rc = peripheral_request_list(pin_req, DRV_NAME);
+	rc = peripheral_request_list(pin_req, KBUILD_MODNAME);
 	if (rc) {
 		dev_err(&pdev->dev, "Requesting peripherals failed!\n");
 		return rc;
@@ -1739,7 +1732,7 @@ static struct platform_driver bfin_mac_driver = {
 	.resume = bfin_mac_resume,
 	.suspend = bfin_mac_suspend,
 	.driver = {
-		.name = DRV_NAME,
+		.name = KBUILD_MODNAME,
 		.owner	= THIS_MODULE,
 	},
 };
-- 
1.7.3.2


^ permalink raw reply related

* [PATCH 2/5] netdev: bfin_mac: mark setup_system_regs as static
From: Mike Frysinger @ 2010-10-26  7:40 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel
In-Reply-To: <1288078827-17495-1-git-send-email-vapier@gentoo.org>

No need for this to be exported since it is only used in this driver.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index a572bcb..e712be4 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -558,7 +558,7 @@ static const struct ethtool_ops bfin_mac_ethtool_ops = {
 };
 
 /**************************************************************************/
-void setup_system_regs(struct net_device *dev)
+static void setup_system_regs(struct net_device *dev)
 {
 	struct bfin_mac_local *lp = netdev_priv(dev);
 	int i;
-- 
1.7.3.2


^ permalink raw reply related

* [PATCH 3/5] netdev: bfin_mac: drop unused Mac data
From: Mike Frysinger @ 2010-10-26  7:40 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel
In-Reply-To: <1288078827-17495-1-git-send-email-vapier@gentoo.org>

We don't use this local "Mac" data anywhere (since we rely on the
netdev's storage), so punt it.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index aed68be..4827f6b 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -68,7 +68,6 @@ struct bfin_mac_local {
 	 */
 	struct net_device_stats stats;
 
-	unsigned char Mac[6];	/* MAC address of the board */
 	spinlock_t lock;
 
 	int wol;		/* Wake On Lan */
-- 
1.7.3.2


^ permalink raw reply related

* [PATCH 4/5] netdev: bfin_mac: let boards set vlan masks
From: Mike Frysinger @ 2010-10-26  7:40 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel
In-Reply-To: <1288078827-17495-1-git-send-email-vapier@gentoo.org>

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c   |    7 +++++++
 drivers/net/bfin_mac.h   |    3 +++
 include/linux/bfin_mac.h |    1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index e712be4..0b9fc51 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -588,6 +588,10 @@ static void setup_system_regs(struct net_device *dev)
 
 	bfin_write_EMAC_MMC_CTL(RSTC | CROLL);
 
+	/* Set vlan regs to let 1522 bytes long packets pass through */
+	bfin_write_EMAC_VLAN1(lp->vlan1_mask);
+	bfin_write_EMAC_VLAN2(lp->vlan2_mask);
+
 	/* Initialize the TX DMA channel registers */
 	bfin_write_DMA2_X_COUNT(0);
 	bfin_write_DMA2_X_MODIFY(4);
@@ -1520,6 +1524,9 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 		goto out_err_mii_probe;
 	}
 
+	lp->vlan1_mask = ETH_P_8021Q | mii_bus_data->vlan1_mask;
+	lp->vlan2_mask = ETH_P_8021Q | mii_bus_data->vlan2_mask;
+
 	/* Fill in the fields of the device structure with ethernet values. */
 	ether_setup(ndev);
 
diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index 4827f6b..c1a0d66 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -75,6 +75,9 @@ struct bfin_mac_local {
 	struct timer_list tx_reclaim_timer;
 	struct net_device *ndev;
 
+	/* Data for EMAC_VLAN1 regs */
+	u16 vlan1_mask, vlan2_mask;
+
 	/* MII and PHY stuffs */
 	int old_link;          /* used by bf537_adjust_link */
 	int old_speed;
diff --git a/include/linux/bfin_mac.h b/include/linux/bfin_mac.h
index 904dec7..a69554e 100644
--- a/include/linux/bfin_mac.h
+++ b/include/linux/bfin_mac.h
@@ -24,6 +24,7 @@ struct bfin_mii_bus_platform_data {
 	const unsigned short *mac_peripherals;
 	int phy_mode;
 	unsigned int phy_mask;
+	unsigned short vlan1_mask, vlan2_mask;
 };
 
 #endif
-- 
1.7.3.2


^ permalink raw reply related

* [PATCH 5/5] netdev: bfin_mac: disable hardware checksum if writeback cache is enabled
From: Mike Frysinger @ 2010-10-26  7:40 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1288078827-17495-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

With writeback caches, corrupted RX packets will be sent up the stack
without any error markings.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index c1a0d66..f8559ac 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -17,7 +17,14 @@
 #include <linux/etherdevice.h>
 #include <linux/bfin_mac.h>
 
+/*
+ * Disable hardware checksum for bug #5600 if writeback cache is
+ * enabled. Otherwize, corrupted RX packet will be sent up stack
+ * without error mark.
+ */
+#ifndef CONFIG_BFIN_EXTMEM_WRITEBACK
 #define BFIN_MAC_CSUM_OFFLOAD
+#endif
 
 #define TX_RECLAIM_JIFFIES (HZ / 5)
 
-- 
1.7.3.2


^ permalink raw reply related

* dev_alloc_skb and latency issues
From: Jean-Michel Hautbois @ 2010-10-26  8:04 UTC (permalink / raw)
  To: netdev

Hi Everyone !

I am new to this mailing list, and I hope this kind of question hasn't
already been solved before (didn't find anything in the archives...).
I am facing some latency issues in the network layer (I am using a
bridge in order to transmit data between one interface to another).

I am focusing on allocation of memory using alloc_skb for *every* new
packet, and freeing of each packet before receiving a new one.
My use case is quite easy : I always have similar packets (some bytes
are changed, but the size is the same).
I don't think I am the only one with such a use case, and am thinking
about an optimization in this case (probably for others too) : why do
we have to allocate using kmem_cache for all the new packets ?

We could probably use a little piece of code which would reuse the
buffer if it hasn't to be used by anyone else.
I am thinking that if the buffer is ready to be freed (in kfree_skb or
skb_release_all for instance) then, mark the skb as "free" but do not
actually free memory.
On the next dev_alloc_skb, check this mark, and if it is present, do
not allocate, and just "memset" the skb.

This would be in my point of view really efficient when packets are similar.
Anyway, you probably have ideas about that stuff, and I am waiting for
your advices about that :).

Thanks in advance,
Best Regards,
JM

^ permalink raw reply

* Re: dev_alloc_skb and latency issues
From: Eric Dumazet @ 2010-10-26  8:25 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: netdev
In-Reply-To: <AANLkTikzQw3gHT8RZUD4cnGny5Xwh6_X==zFXBfi5muS@mail.gmail.com>

Le mardi 26 octobre 2010 à 10:04 +0200, Jean-Michel Hautbois a écrit :
> Hi Everyone !
> 
> I am new to this mailing list, and I hope this kind of question hasn't
> already been solved before (didn't find anything in the archives...).
> I am facing some latency issues in the network layer (I am using a
> bridge in order to transmit data between one interface to another).
> 
> I am focusing on allocation of memory using alloc_skb for *every* new
> packet, and freeing of each packet before receiving a new one.
> My use case is quite easy : I always have similar packets (some bytes
> are changed, but the size is the same).
> I don't think I am the only one with such a use case, and am thinking
> about an optimization in this case (probably for others too) : why do
> we have to allocate using kmem_cache for all the new packets ?
> 
> We could probably use a little piece of code which would reuse the
> buffer if it hasn't to be used by anyone else.
> I am thinking that if the buffer is ready to be freed (in kfree_skb or
> skb_release_all for instance) then, mark the skb as "free" but do not
> actually free memory.
> On the next dev_alloc_skb, check this mark, and if it is present, do
> not allocate, and just "memset" the skb.
> 
> This would be in my point of view really efficient when packets are similar.
> Anyway, you probably have ideas about that stuff, and I am waiting for
> your advices about that :).

Once you add all necessary code to handle a new cache layer, you end in
a situation is brings nothing but extra cost and bugs (check recent
discussion about rx_recycle stuff in gianfar driver)

Really, kmem_cache is pretty fast and scalable. If not, better to work
on this, instead of adding yet another layer.




^ permalink raw reply

* [PATCH net-next-2.6] be2net: Schedule/Destroy worker thread in probe()/remove() rather than open()/close()
From: Somnath Kotur @ 2010-10-26  9:01 UTC (permalink / raw)
  To: netdev

When async mcc compls are rcvd on an i/f that is down (and so interrupts are disabled)
they just lie unprocessed in the compl queue.The compl queue can eventually get filled
up and cause the BE to lock up.The fix is to use be_worker to reap mcc compls when the
i/f is down.be_worker is now launched in be_probe() and canceled in be_remove().

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/benet/be_main.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 45b1f66..deb546e 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1806,6 +1806,20 @@ static void be_worker(struct work_struct *work)
 	struct be_rx_obj *rxo;
 	int i;
 
+	/* when interrupts are not yet enabled, just reap any pending
+	* mcc completions */
+	if (!netif_running(adapter->netdev)) {
+		int mcc_compl, status = 0;
+
+		mcc_compl = be_process_mcc(adapter, &status);
+
+		if (mcc_compl) {
+			struct be_mcc_obj *mcc_obj = &adapter->mcc_obj;
+			be_cq_notify(adapter, mcc_obj->cq.id, false, mcc_compl);
+		}
+		goto reschedule;
+	}
+
 	if (!adapter->stats_ioctl_sent)
 		be_cmd_get_stats(adapter, &adapter->stats_cmd);
 
@@ -1824,6 +1838,7 @@ static void be_worker(struct work_struct *work)
 	if (!adapter->ue_detected)
 		be_detect_dump_ue(adapter);
 
+reschedule:
 	schedule_delayed_work(&adapter->work, msecs_to_jiffies(1000));
 }
 
@@ -2019,8 +2034,6 @@ static int be_close(struct net_device *netdev)
 	struct be_eq_obj *tx_eq = &adapter->tx_eq;
 	int vec, i;
 
-	cancel_delayed_work_sync(&adapter->work);
-
 	be_async_mcc_disable(adapter);
 
 	netif_stop_queue(netdev);
@@ -2085,8 +2098,6 @@ static int be_open(struct net_device *netdev)
 	/* Now that interrupts are on we can process async mcc */
 	be_async_mcc_enable(adapter);
 
-	schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
-
 	status = be_cmd_link_status_query(adapter, &link_up, &mac_speed,
 			&link_speed);
 	if (status)
@@ -2715,6 +2726,8 @@ static void __devexit be_remove(struct pci_dev *pdev)
 	if (!adapter)
 		return;
 
+	cancel_delayed_work_sync(&adapter->work);
+
 	unregister_netdev(adapter->netdev);
 
 	be_clear(adapter);
@@ -2870,6 +2883,7 @@ static int __devinit be_probe(struct pci_dev *pdev,
 		goto unsetup;
 
 	dev_info(&pdev->dev, "%s port %d\n", nic_name(pdev), adapter->port_num);
+	schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
 	return 0;
 
 unsetup:
-- 
1.5.6.1


^ permalink raw reply related

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2010-10-26  8:57 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <OF8735A2A4.74B4EE91-ON652577C5.0040D90B-652577C7.0056B99D@in.ibm.com>

On Mon, Oct 25, 2010 at 09:20:38PM +0530, Krishna Kumar2 wrote:
> > Krishna Kumar2/India/IBM@IBMIN wrote on 10/20/2010 02:24:52 PM:
> 
> Any feedback, comments, objections, issues or bugs about the
> patches? Please let me know if something needs to be done.
> 
> Some more test results:
> _____________________________________________________
>          Host->Guest BW (numtxqs=2)
> #       BW%     CPU%    RCPU%   SD%     RSD%
> _____________________________________________________

I think we discussed the need for external to guest testing
over 10G. For large messages we should not see any change
but you should be able to get better numbers for small messages
assuming a MQ NIC card.

-- 
MST

^ permalink raw reply

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-10-26  9:08 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, Michael S. Tsirkin,
	netdev, rusty
In-Reply-To: <OF5C53E9CF.FFDF2CE7-ON652577C8.00191D14-652577C8.001C2154@LocalDomain>

Krishna Kumar2/India/IBM wrote on 10/26/2010 10:40:35 AM:

> > I am trying to wrap my head around kernel/user interface here.
> > E.g., will we need another incompatible change when we add multiple RX
> > queues?
>
> Though I added a 'mq' option to qemu, there shouldn't be
> any incompatibility between old and new qemu's wrt vhost
> and virtio-net drivers. So the old qemu will run new host
> and new guest without issues, and new qemu can also run
> old host and old guest. Multiple RXQ will also not add
> any incompatibility.
>
> With MQ RX, I will be able to remove the hueristic (idea
> from David Stevens).  The idea is: Guest sends out packets
> on, say TXQ#2, vhost#2 processes the packets but packets
> going out from host to guest might be sent out on a
> different RXQ, say RXQ#4.  Guest receives the packet on
> RXQ#4, and all future responses on that connection are sent
> on TXQ#4.  Now vhost#4 processes both RX and TX packets for
> this connection.  Without needing to hash on the connection,
> guest can make sure that the same vhost thread will handle
> a single connection.
>
> > Also need to think about how robust our single stream heuristic is,
> > e.g. what are the chances it will misdetect a bidirectional
> > UDP stream as a single TCP?

> I think it should not happen. The hueristic code gets
> called for handling just the transmit packets, packets
> that vhost sends out to the guest skip this path.
>
> I tested unidirectional and bidirectional UDP to confirm:
>
> 8 iterations of iperf tests, each iteration of 15 secs,
> result is the sum of all 8 iterations in Gbits/sec
> __________________________________________
> Uni-directional          Bi-directional
>   Org      New             Org      New
> __________________________________________
>   71.78    71.77           71.74   72.07
> __________________________________________


Results for UDP BW tests (unidirectional, sum across
3 iterations, each iteration of 45 seconds, default
netperf, vhosts bound to cpus 0-3; no other tuning):

------ numtxqs=8, vhosts=5 ---------
#     BW%    CPU%    SD%
------------------------------------
1     .49    1.07     0
2    23.51   52.51    26.66
4    75.17   72.43    8.57
8    86.54   80.21    27.85
16   92.37   85.99    6.27
24   91.37   84.91    8.41
32   89.78   82.90    3.31
48   89.85   79.95   -3.57
64   85.83   80.28    2.22
80   88.90   79.47   -23.18
96   90.12   79.98    14.71
128  86.13   80.60    4.42
------------------------------------
BW: 71.3%, CPU: 80.4%, SD: 1.2%


------ numtxqs=16, vhosts=5 --------
#    BW%      CPU%     SD%
------------------------------------
1    1.80     0        0
2    19.81    50.68    26.66
4    57.31    52.77    8.57
8    108.44   88.19   -5.21
16   106.09   85.03   -4.44
24   102.34   84.23   -.82
32   102.77   82.71   -5.81
48   100.00   79.62   -7.29
64   96.86    79.75   -6.10
80   99.26    79.82   -27.34
96   94.79    80.02   -5.08
128  98.14    81.15   -15.25
------------------------------------
BW: 77.9%,  CPU: 80.4%,  SD: -13.6%

Thanks,

- KK


^ permalink raw reply

* RE: [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.
From: Vladislav Zolotarov @ 2010-10-26  9:14 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, netdev@vger.kernel.org, Hao Zheng, Eilon Greenstein
In-Reply-To: <AANLkTimjTZU_GWgBNADgAtw7ViztZD-cdhHLQYremz2A@mail.gmail.com>

> 
> Even though the fix prevents the panic, I'm still a little concerned
> that you ran into it at all when vlan stripping was disabled.  That
> function should only be called when a tag was received by the card.
> Is it possible that __vlan_hwaccel_put_tag is being called even in
> cases when no tag was stripped?  

Correct, and that's because u've patched the driver to put it the
following way:

if (le16_to_cpu(cqe->fast_path_cqe.pars_flags.flags) &
                     PARSING_FLAGS_VLAN)
        __vlan_hwaccel_put_tag(skb, le16_to_cpu(cqe->fast_path_cqe.vlan_tag));

The condition above will be TRUE regardless VLAN stripping is enabled or
disabled as the parsing flags come from our PARSER HW block and simply
indicates whether this frame has a VLAN header or not. When I disabled a VLAN
stripping I should have fixed this lines too but I think I didn't... ;)

> Maybe we made a mistake when converting the driver?

Since the driver always configures the VLAN stripping now, regardless the kernel
configuration, I think your patch was just fine. ;)

Thanks,
vlad





^ permalink raw reply

* [PATCH 0/4]qlcnic: bug fixes
From: Amit Kumar Salecha @ 2010-10-26  9:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty

Hi
  Series of 4 bug fixes. Apply them on net-2.6 branch.
-Amit

^ permalink raw reply

* [PATCH] qlcnic: dma address align check
From: Amit Kumar Salecha @ 2010-10-26  9:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty
In-Reply-To: <1288085882-11988-1-git-send-email-amit.salecha@qlogic.com>

Device requires tx_hw_cosnumer to be 64 byte aligned.
Tx desc size is 64 byte, alloc tx_hw_consumer with tx desc.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_ctx.c |   35 +++++++++++++++--------------------
 1 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_ctx.c b/drivers/net/qlcnic/qlcnic_ctx.c
index 1cdc05d..21c9c28 100644
--- a/drivers/net/qlcnic/qlcnic_ctx.c
+++ b/drivers/net/qlcnic/qlcnic_ctx.c
@@ -418,18 +418,9 @@ int qlcnic_alloc_hw_resources(struct qlcnic_adapter *adapter)
 	recv_ctx = &adapter->recv_ctx;
 	tx_ring = adapter->tx_ring;
 
-	tx_ring->hw_consumer = (__le32 *)pci_alloc_consistent(pdev, sizeof(u32),
-						&tx_ring->hw_cons_phys_addr);
-	if (tx_ring->hw_consumer == NULL) {
-		dev_err(&pdev->dev, "failed to allocate tx consumer\n");
-		return -ENOMEM;
-	}
-	*(tx_ring->hw_consumer) = 0;
-
 	/* cmd desc ring */
-	addr = pci_alloc_consistent(pdev, TX_DESC_RINGSIZE(tx_ring),
-			&tx_ring->phys_addr);
-
+	addr = pci_alloc_consistent(pdev, TX_DESC_RINGSIZE(tx_ring) +
+			sizeof(u32), &tx_ring->phys_addr);
 	if (addr == NULL) {
 		dev_err(&pdev->dev, "failed to allocate tx desc ring\n");
 		err = -ENOMEM;
@@ -437,6 +428,17 @@ int qlcnic_alloc_hw_resources(struct qlcnic_adapter *adapter)
 	}
 
 	tx_ring->desc_head = (struct cmd_desc_type0 *)addr;
+	tx_ring->hw_consumer = (__le32 *)(((char *)addr) +
+					TX_DESC_RINGSIZE(tx_ring));
+	tx_ring->hw_cons_phys_addr = (dma_addr_t)(((char *)tx_ring->phys_addr) +
+					TX_DESC_RINGSIZE(tx_ring));
+	if (tx_ring->hw_cons_phys_addr & 0x3F) {
+		dev_err(&pdev->dev, "Device requires 64 byte aligned dma addr"
+			". dma_addr=%p\n", (void *)tx_ring->hw_cons_phys_addr);
+		err = -ENOMEM;
+		goto err_out_free;
+	}
+	*(tx_ring->hw_consumer) = 0;
 
 	for (ring = 0; ring < adapter->max_rds_rings; ring++) {
 		rds_ring = &recv_ctx->rds_rings[ring];
@@ -516,19 +518,12 @@ void qlcnic_free_hw_resources(struct qlcnic_adapter *adapter)
 	recv_ctx = &adapter->recv_ctx;
 
 	tx_ring = adapter->tx_ring;
-	if (tx_ring->hw_consumer != NULL) {
-		pci_free_consistent(adapter->pdev,
-				sizeof(u32),
-				tx_ring->hw_consumer,
-				tx_ring->hw_cons_phys_addr);
-		tx_ring->hw_consumer = NULL;
-	}
-
 	if (tx_ring->desc_head != NULL) {
 		pci_free_consistent(adapter->pdev,
-				TX_DESC_RINGSIZE(tx_ring),
+				TX_DESC_RINGSIZE(tx_ring) + sizeof(u32),
 				tx_ring->desc_head, tx_ring->phys_addr);
 		tx_ring->desc_head = NULL;
+		tx_ring->hw_consumer = NULL;
 	}
 
 	for (ring = 0; ring < adapter->max_rds_rings; ring++) {
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH] qlcnic: fix mac learning
From: Amit Kumar Salecha @ 2010-10-26  9:37 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty
In-Reply-To: <1288085882-11988-1-git-send-email-amit.salecha@qlogic.com>

In failover bonding case, same mac address can be programmed on other slave function.
Fw will delete old entry (original func) associated with that mac address.
Need to reporgram mac address, if failover again happen to original function.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    1 +
 drivers/net/qlcnic/qlcnic_main.c |    5 +++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 26c37d3..a60ff17 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -942,6 +942,7 @@ struct qlcnic_ipaddr {
 #define QLCNIC_LOOPBACK_TEST		2
 
 #define QLCNIC_FILTER_AGE	80
+#define QLCNIC_READD_AGE	20
 #define QLCNIC_LB_MAX_FILTERS	64
 
 struct qlcnic_filter {
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index f047c7c..5a3ce08 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1860,6 +1860,11 @@ qlcnic_send_filter(struct qlcnic_adapter *adapter,
 	hlist_for_each_entry_safe(tmp_fil, tmp_hnode, n, head, fnode) {
 		if (!memcmp(tmp_fil->faddr, &src_addr, ETH_ALEN) &&
 			    tmp_fil->vlan_id == vlan_id) {
+
+			if (jiffies >
+			    (QLCNIC_READD_AGE * HZ + tmp_fil->ftime))
+				qlcnic_change_filter(adapter, src_addr, vlan_id,
+								tx_ring);
 			tmp_fil->ftime = jiffies;
 			return;
 		}
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH] qlcnic: reduce RX ring size for nic partition
From: Amit Kumar Salecha @ 2010-10-26  9:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Sony Chacko
In-Reply-To: <1288085882-11988-1-git-send-email-amit.salecha@qlogic.com>

From: Sony Chacko <sony.chacko@qlogic.com>

For NIC Partition adapter, rcv ring size can be reduce, as
physical port is partition-ed.

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h         |    4 ++++
 drivers/net/qlcnic/qlcnic_ethtool.c |   23 +++++------------------
 drivers/net/qlcnic/qlcnic_main.c    |   14 ++++++++++++--
 3 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index a60ff17..6400e6a 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -146,11 +146,13 @@
 #define MAX_CMD_DESCRIPTORS		1024
 #define MAX_RCV_DESCRIPTORS_1G		4096
 #define MAX_RCV_DESCRIPTORS_10G 	8192
+#define MAX_RCV_DESCRIPTORS_VF		2048
 #define MAX_JUMBO_RCV_DESCRIPTORS_1G	512
 #define MAX_JUMBO_RCV_DESCRIPTORS_10G	1024
 
 #define DEFAULT_RCV_DESCRIPTORS_1G	2048
 #define DEFAULT_RCV_DESCRIPTORS_10G	4096
+#define DEFAULT_RCV_DESCRIPTORS_VF	1024
 #define MAX_RDS_RINGS                   2
 
 #define get_next_index(index, length)	\
@@ -971,6 +973,8 @@ struct qlcnic_adapter {
 	u16 num_txd;
 	u16 num_rxd;
 	u16 num_jumbo_rxd;
+	u16 max_rxd;
+	u16 max_jumbo_rxd;
 
 	u8 max_rds_rings;
 	u8 max_sds_rings;
diff --git a/drivers/net/qlcnic/qlcnic_ethtool.c b/drivers/net/qlcnic/qlcnic_ethtool.c
index 25e93a5..ec21d24 100644
--- a/drivers/net/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/qlcnic/qlcnic_ethtool.c
@@ -437,14 +437,8 @@ qlcnic_get_ringparam(struct net_device *dev,
 	ring->rx_jumbo_pending = adapter->num_jumbo_rxd;
 	ring->tx_pending = adapter->num_txd;
 
-	if (adapter->ahw.port_type == QLCNIC_GBE) {
-		ring->rx_max_pending = MAX_RCV_DESCRIPTORS_1G;
-		ring->rx_jumbo_max_pending = MAX_JUMBO_RCV_DESCRIPTORS_1G;
-	} else {
-		ring->rx_max_pending = MAX_RCV_DESCRIPTORS_10G;
-		ring->rx_jumbo_max_pending = MAX_JUMBO_RCV_DESCRIPTORS_10G;
-	}
-
+	ring->rx_max_pending = adapter->max_rxd;
+	ring->rx_jumbo_max_pending = adapter->max_jumbo_rxd;
 	ring->tx_max_pending = MAX_CMD_DESCRIPTORS;
 
 	ring->rx_mini_max_pending = 0;
@@ -472,24 +466,17 @@ qlcnic_set_ringparam(struct net_device *dev,
 		struct ethtool_ringparam *ring)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(dev);
-	u16 max_rcv_desc = MAX_RCV_DESCRIPTORS_10G;
-	u16 max_jumbo_desc = MAX_JUMBO_RCV_DESCRIPTORS_10G;
 	u16 num_rxd, num_jumbo_rxd, num_txd;
 
-
 	if (ring->rx_mini_pending)
 		return -EOPNOTSUPP;
 
-	if (adapter->ahw.port_type == QLCNIC_GBE) {
-		max_rcv_desc = MAX_RCV_DESCRIPTORS_1G;
-		max_jumbo_desc = MAX_JUMBO_RCV_DESCRIPTORS_10G;
-	}
-
 	num_rxd = qlcnic_validate_ringparam(ring->rx_pending,
-			MIN_RCV_DESCRIPTORS, max_rcv_desc, "rx");
+			MIN_RCV_DESCRIPTORS, adapter->max_rxd, "rx");
 
 	num_jumbo_rxd = qlcnic_validate_ringparam(ring->rx_jumbo_pending,
-			MIN_JUMBO_DESCRIPTORS, max_jumbo_desc, "rx jumbo");
+			MIN_JUMBO_DESCRIPTORS, adapter->max_jumbo_rxd,
+						"rx jumbo");
 
 	num_txd = qlcnic_validate_ringparam(ring->tx_pending,
 			MIN_CMD_DESCRIPTORS, MAX_CMD_DESCRIPTORS, "tx");
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 5a3ce08..7a298cd 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -656,13 +656,23 @@ qlcnic_check_options(struct qlcnic_adapter *adapter)
 
 	dev_info(&pdev->dev, "firmware v%d.%d.%d\n",
 			fw_major, fw_minor, fw_build);
-
 	if (adapter->ahw.port_type == QLCNIC_XGBE) {
-		adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_10G;
+		if (adapter->flags & QLCNIC_ESWITCH_ENABLED) {
+			adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_VF;
+			adapter->max_rxd = MAX_RCV_DESCRIPTORS_VF;
+		} else {
+			adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_10G;
+			adapter->max_rxd = MAX_RCV_DESCRIPTORS_10G;
+		}
+
 		adapter->num_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_10G;
+		adapter->max_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_10G;
+
 	} else if (adapter->ahw.port_type == QLCNIC_GBE) {
 		adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_1G;
 		adapter->num_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_1G;
+		adapter->max_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_1G;
+		adapter->max_rxd = MAX_RCV_DESCRIPTORS_1G;
 	}
 
 	adapter->msix_supported = !!use_msi_x;
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH] qlcnic: define valid vlan id range
From: Amit Kumar Salecha @ 2010-10-26  9:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Sony Chacko
In-Reply-To: <1288085882-11988-1-git-send-email-amit.salecha@qlogic.com>

From: Sony Chacko <sony.chacko@qlogic.com>

4095 vlan id is reserved and should not be use.

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 6400e6a..8ecc170 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -1134,7 +1134,7 @@ struct qlcnic_eswitch {
 #define MAX_RX_QUEUES		4
 #define DEFAULT_MAC_LEARN	1
 
-#define IS_VALID_VLAN(vlan)	(vlan >= MIN_VLAN_ID && vlan <= MAX_VLAN_ID)
+#define IS_VALID_VLAN(vlan)	(vlan >= MIN_VLAN_ID && vlan < MAX_VLAN_ID)
 #define IS_VALID_BW(bw)		(bw >= MIN_BW && bw <= MAX_BW)
 #define IS_VALID_TX_QUEUES(que)	(que > 0 && que <= MAX_TX_QUEUES)
 #define IS_VALID_RX_QUEUES(que)	(que > 0 && que <= MAX_RX_QUEUES)
-- 
1.6.0.2


^ permalink raw reply related

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2010-10-26  9:38 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <OFBB51646E.01CC3630-ON652577C8.0030D6EF-652577C8.0031DA3C@in.ibm.com>

On Tue, Oct 26, 2010 at 02:38:53PM +0530, Krishna Kumar2 wrote:
> Results for UDP BW tests (unidirectional, sum across
> 3 iterations, each iteration of 45 seconds, default
> netperf, vhosts bound to cpus 0-3; no other tuning):

Is binding vhost threads to CPUs really required?
What happens if we let the scheduler do its job?

-- 
MST

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox