Netdev List
 help / color / mirror / Atom feed
* how many msi (msi-x) vectors can be setup?
From: zhou rui @ 2010-05-19 15:18 UTC (permalink / raw)
  To: netdev

hi there:
how many msi (msi-x) vectors can be setup?
the number is limited by hardware resource(nic), or kernel ?
I found that the driver (broadcom 57711 ver 1.5.12) tried to request
16 queues on my kernel2.6.27,but only 2  available
will it be increased if I update the driver or kernel?
and there is a limitiation in the system? if the other devices have
already occupied too many MSI vectors then it is not enough.


thanks
rui

^ permalink raw reply

* Re: [PATCH 3/4] fec: add support for Freescale i.MX25 PDK (3DS)
From: Jean-Christophe Dubois @ 2010-05-19 15:15 UTC (permalink / raw)
  To: netdev, linux-arm-kernel; +Cc: Baruch Siach, Sascha Hauer, Greg Ungerer
In-Reply-To: <20100125112116.GF6724@jasper.tkos.co.il>

le lundi 25 janvier 2010 Baruch Siach a écrit
> Hi Greg, netdev,
> 
> On Wed, Dec 16, 2009 at 08:34:06AM +0200, Baruch Siach wrote:
> > On Wed, Dec 16, 2009 at 10:13:56AM +1000, Greg Ungerer wrote:
> > > Baruch Siach wrote:
> > > >On Tue, Dec 15, 2009 at 09:52:24PM +1000, Greg Ungerer wrote:
> > > >>On 12/15/2009 06:31 PM, Baruch Siach wrote:
> > > >>>+#ifndef CONFIG_M5272
> > > >>
> > > >>I would suggest making this conditional on FEC_MIIGSK_ENR.
> > > >>Although the CONFIG_M5272 is the only case here currently,
> > > >>that may change over the years. And using this here may not
> > > >>be obvious to the causual code reader, since the register
> > > >>offset definitions don't explicitly key on CONFIG_M5272.
> > > >
> > > >OK, I'll change this conditional.
> > > >
> > > >Can I take this as an Ack from you?
> > > 
> > > With that conditional check changed, sure:
> > > 
> > > Acked-by:  Greg Ungerer <gerg@uclinux.org>
> > 
> > Thanks. The updated patch below.
> 
> I'm really sorry to bug on this again, but since the platform code is
> already upstream the i.MX25 code doesn't build without this patch
> (include/linux/fec.h missing). So, someone please pick up this patch,
> preferably prior to .33.
> 
> baruch

I am just wondering if somebody is going to pick up this patch 
(http://patchwork.ozlabs.org/patch/41235/) so that it finds its way on 
mainline.

The i.MX25 PDK platform (3DStack) has been broken for 2 kernel releases (as 
the actual code depend on this). Is there hope that this patch is picked up by 
somebody in order to get merged to main line someday?

Or is it not acceptable in this state for some reason?

Either the FEC changes need to be merged (possibly with some modification if 
there are issues) or the part depending on it in the ARM tree need to be 
reverted. We should not stay in this non coherent state.

JC

^ permalink raw reply

* Re: sky2 poweroff screws up my network
From: Stephen Hemminger @ 2010-05-19 15:14 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: netdev
In-Reply-To: <20100519143208.GL3900@ihatethathostname.lab.bos.redhat.com>

On Wed, 19 May 2010 10:32:08 -0400
Kyle McMartin <kmcmartin@redhat.com> wrote:

> Hi Stephen,
> 
> I've noticed a rather strange problem with the onboard sky2 on one of my
> machines. When I halt -p it, my router stops passing traffic completely
> until I kill the power to the machine entirely. If it's just been shut
> down with halt -p, it recovers immediately after turning it back on,
> while it's in BIOS.
> 
> Any thoughts on this? My guess is its because the sky2 shutdown puts the
> nic to sleep and when the host is off, bad things happen.
> 
> regards, Kyle

The sky2 shutdown puts the chip in Wake On Lan state; this
does a separate link speed negotiation (100 mbit) which may be a problem
if speed duplex is forced.

^ permalink raw reply

* sky2 poweroff screws up my network
From: Kyle McMartin @ 2010-05-19 14:32 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

Hi Stephen,

I've noticed a rather strange problem with the onboard sky2 on one of my
machines. When I halt -p it, my router stops passing traffic completely
until I kill the power to the machine entirely. If it's just been shut
down with halt -p, it recovers immediately after turning it back on,
while it's in BIOS.

Any thoughts on this? My guess is its because the sky2 shutdown puts the
nic to sleep and when the host is off, bad things happen.

regards, Kyle


^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-19 14:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neil Horman, Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <1274278229.2766.112.camel@edumazet-laptop>

On Wed, May 19, 2010 at 04:10:29PM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit :
> > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > > 
> > > > Another concern I have is about RPS.
> > > > 
> > > > netif_receive_skb() must be called from process_backlog() context, or
> > > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > > another cpu.
> > > 
> > > Hmm, I just checked again, and this is wrong.
> > > 
> > > In case we enqueue skb on a remote cpu backlog, we also
> > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > > later.
> > > 
> > But if this happens, then we loose the connection between the packet being
> > received and the process doing the reception, so the network cgroup classifier
> > breaks again.
> > 
> > Performance gains are still a big advantage here of course.
> 
> RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled
> by default, so if cgroup classifier is needed, it should work as is.
> 
> Speaking of net/sched/cls_cgroup.c, I am contemplating following
> sequence :
> 
> rcu_read_lock();
> classid = task_cls_state(current)->classid;
> rcu_read_unlock();
> 
Yeah, I'd noticed there was no write side to that, but hadn't quite gotten to
investigating  that further :)

> RCU is definitly so special (should I say magic ?), that we use it even
> if not needed. It makes us happy...
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: r8169 transmit queue time outs
From: Eric Dumazet @ 2010-05-19 14:18 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: Francois Romieu, netdev, dgilmore
In-Reply-To: <20100519134344.GI3900@ihatethathostname.lab.bos.redhat.com>

Le mercredi 19 mai 2010 à 09:43 -0400, Kyle McMartin a écrit :
> On Thu, May 06, 2010 at 10:10:24PM +0200, Francois Romieu wrote:
> > Kyle McMartin <kmcmartin@redhat.com> :
> > [...]
> > > Some of our users have been seeing their r8169 cards just up and stop
> > > transmitting packets pretty quickly after boot with recent kernels.
> > [...]
> > > Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.i686.PAE #1
> > 
> > Can they upgrade to 2.6.32.11-99.fc12.i686 and try an out-of-tree build
> > of the driver at http://userweb.kernel.org/~romieu/r8169/2.6.32.11-99.fc12/ ?
> > 
> > It should be quite close to the current git kernel.
> > 
> 
> I provided a bunch of testers with a backport of the current git head
> r8169 driver, and sadly, they report the TX timeout issues still occur.
> :/
> 
> Any other ideas?

Scratch the NIC ?

Normally not related, but I mentioned once following patch that could be
tried. (Not reset the NIC if we receive too many frames in a row)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..c4dbb15 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4520,10 +4520,8 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 				dev->stats.rx_length_errors++;
 			if (status & RxCRC)
 				dev->stats.rx_crc_errors++;
-			if (status & RxFOVF) {
-				rtl8169_schedule_work(dev, rtl8169_reset_task);
+			if (status & RxFOVF)
 				dev->stats.rx_fifo_errors++;
-			}
 			rtl8169_mark_to_asic(desc, tp->rx_buf_sz);
 		} else {
 			struct sk_buff *skb = tp->Rx_skbuff[entry];



^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-19 14:10 UTC (permalink / raw)
  To: Neil Horman; +Cc: Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <20100519120547.GB26584@hmsreliant.think-freely.org>

Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit :
> On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > 
> > > Another concern I have is about RPS.
> > > 
> > > netif_receive_skb() must be called from process_backlog() context, or
> > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > another cpu.
> > 
> > Hmm, I just checked again, and this is wrong.
> > 
> > In case we enqueue skb on a remote cpu backlog, we also
> > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > later.
> > 
> But if this happens, then we loose the connection between the packet being
> received and the process doing the reception, so the network cgroup classifier
> breaks again.
> 
> Performance gains are still a big advantage here of course.

RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled
by default, so if cgroup classifier is needed, it should work as is.

Speaking of net/sched/cls_cgroup.c, I am contemplating following
sequence :

rcu_read_lock();
classid = task_cls_state(current)->classid;
rcu_read_unlock();

RCU is definitly so special (should I say magic ?), that we use it even
if not needed. It makes us happy...





^ permalink raw reply

* Re: r8169 transmit queue time outs
From: Kyle McMartin @ 2010-05-19 13:48 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: Francois Romieu, netdev, dgilmore
In-Reply-To: <20100519134344.GI3900@ihatethathostname.lab.bos.redhat.com>

On Wed, May 19, 2010 at 09:43:44AM -0400, Kyle McMartin wrote:
> I provided a bunch of testers with a backport of the current git head
> r8169 driver, and sadly, they report the TX timeout issues still occur.
> :/
> 
> Any other ideas?
> 

They note the vendor driver seems to work without these transmit
timeouts... Would it be worthwhile for me to take a look and compare the
tx setup between these git head and the vendor latest driver?

regards, Kyle

^ permalink raw reply

* Re: r8169 transmit queue time outs
From: Kyle McMartin @ 2010-05-19 13:43 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Kyle McMartin, netdev, dgilmore
In-Reply-To: <20100506201024.GA3541@electric-eye.fr.zoreil.com>

On Thu, May 06, 2010 at 10:10:24PM +0200, Francois Romieu wrote:
> Kyle McMartin <kmcmartin@redhat.com> :
> [...]
> > Some of our users have been seeing their r8169 cards just up and stop
> > transmitting packets pretty quickly after boot with recent kernels.
> [...]
> > Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.i686.PAE #1
> 
> Can they upgrade to 2.6.32.11-99.fc12.i686 and try an out-of-tree build
> of the driver at http://userweb.kernel.org/~romieu/r8169/2.6.32.11-99.fc12/ ?
> 
> It should be quite close to the current git kernel.
> 

I provided a bunch of testers with a backport of the current git head
r8169 driver, and sadly, they report the TX timeout issues still occur.
:/

Any other ideas?

regards, Kyle

^ permalink raw reply

* [PATCH net-next-2.6] bonding: optimize tlb_get_least_loaded_slave
From: Jiri Pirko @ 2010-05-19 13:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, bonding-devel

In the worst case, when the first loop breaks an the end of the slave list,
the slave list is iterated through twice. This patch reduces this
function only to one loop. Also makes it simpler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_alb.c |   33 +++++++++++++--------------------
 1 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 40fdc41..25c14c6 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -233,34 +233,27 @@ static void tlb_deinitialize(struct bonding *bond)
 	_unlock_tx_hashtbl(bond);
 }
 
+static long long compute_gap(struct slave *slave)
+{
+	return (s64) (slave->speed << 20) - /* Convert to Megabit per sec */
+	       (s64) (SLAVE_TLB_INFO(slave).load << 3); /* Bytes to bits */
+}
+
 /* Caller must hold bond lock for read */
 static struct slave *tlb_get_least_loaded_slave(struct bonding *bond)
 {
 	struct slave *slave, *least_loaded;
-	s64 max_gap;
-	int i, found = 0;
-
-	/* Find the first enabled slave */
-	bond_for_each_slave(bond, slave, i) {
-		if (SLAVE_IS_OK(slave)) {
-			found = 1;
-			break;
-		}
-	}
-
-	if (!found) {
-		return NULL;
-	}
+	long long max_gap;
+	int i;
 
-	least_loaded = slave;
-	max_gap = (s64)(slave->speed << 20) - /* Convert to Megabit per sec */
-			(s64)(SLAVE_TLB_INFO(slave).load << 3); /* Bytes to bits */
+	least_loaded = NULL;
+	max_gap = LLONG_MIN;
 
 	/* Find the slave with the largest gap */
-	bond_for_each_slave_from(bond, slave, i, least_loaded) {
+	bond_for_each_slave(bond, slave, i) {
 		if (SLAVE_IS_OK(slave)) {
-			s64 gap = (s64)(slave->speed << 20) -
-					(s64)(SLAVE_TLB_INFO(slave).load << 3);
+			long long gap = compute_gap(slave);
+
 			if (max_gap < gap) {
 				least_loaded = slave;
 				max_gap = gap;
-- 
1.6.6.1


^ permalink raw reply related

* Re: [GIT] Networking
From: John W. Linville @ 2010-05-19 13:09 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, torvalds, akpm, netdev, linux-kernel
In-Reply-To: <1274267582.2763.24.camel@localhost>

On Wed, May 19, 2010 at 12:13:02PM +0100, Ben Hutchings wrote:
> On Tue, 2010-05-18 at 23:37 -0700, David Miller wrote:

> > 3) PHY module autoloading support from David Howells

> For the record, that was actually David Woodhouse.

All Davids are the same... :-)

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-19 12:55 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <20100519120547.GB26584@hmsreliant.think-freely.org>

On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote:
> On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > 
> > > Another concern I have is about RPS.
> > > 
> > > netif_receive_skb() must be called from process_backlog() context, or
> > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > another cpu.
> > 
> > Hmm, I just checked again, and this is wrong.
> > 
> > In case we enqueue skb on a remote cpu backlog, we also
> > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > later.
> > 
> But if this happens, then we loose the connection between the packet being
> received and the process doing the reception, so the network cgroup classifier
> breaks again.
> 
> Performance gains are still a big advantage here of course.
> Neil
> 
Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has
no rps map.

I'll test this patch out in just a bit
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* RE: any change in socket systemcall or packet_mmap regarding multiqueue nic?
From: Eric Dumazet @ 2010-05-19 12:28 UTC (permalink / raw)
  To: Jon Zhou; +Cc: netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497EFC946@MILEXCH2.ds.jdsu.net>

Le mercredi 19 mai 2010 à 03:36 -0700, Jon Zhou a écrit :
> 
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Sent: Wednesday, May 19, 2010 12:25 PM
> To: Jon Zhou
> Cc: netdev@vger.kernel.org
> Subject: Re: any change in socket systemcall or packet_mmap regarding multiqueue nic?
> 
> Le mardi 18 mai 2010 à 19:55 -0700, Jon Zhou a écrit :
> > hi
> > the multiqueue networking can utilize multi-core to process packets from multiqueue nic,
> > but any change in related userspace application part, such as socket system call, packet_mmap? these userspace API can also utilize multicore to process packets from kernel?
> > otherwise they have to read data in serialization
> > 
> 
> Thats a bit general question. Works are in progress.
> 
> So far, you can use a new condition in filters to match a given queue
> index for incoming packets. A sniffer could setup N different sockets to
> receive data from N NIC queues.
> 
> jon->is it something like "ioctl(fd,SOL_SOCKET,queue_id...),could you tell the keyword?

keyword is BPF (used in libpcap) and SKF_AD_QUEUE instruction

Kernel part is ready :

commit d19742fb1c68e6db83b76e06dea5a374c99e104f
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Tue Oct 20 01:06:22 2009 -0700

    filter: Add SKF_AD_QUEUE instruction
    
    It can help being able to filter packets on their queue_mapping.
    
    If filter performance is not good, we could add a "numqueue" field
    in struct packet_type, so that netif_nit_deliver() and other
functions
    can directly ignore packets with not expected queue number.
    
    Lets experiment this simple filter extension first.
    
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 909193e..bb3b435 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -124,7 +124,8 @@ struct sock_fprog   /* Required for
SO_ATTACH_FILTER. */
 #define SKF_AD_NLATTR  12
 #define SKF_AD_NLATTR_NEST     16
 #define SKF_AD_MARK    20
-#define SKF_AD_MAX     24
+#define SKF_AD_QUEUE   24
+#define SKF_AD_MAX     28
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index e3987e1..08db7b9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -306,6 +306,9 @@ load_b:
                case SKF_AD_MARK:
                        A = skb->mark;
                        continue;
+               case SKF_AD_QUEUE:
+                       A = skb->queue_mapping;
+                       continue;
                case SKF_AD_NLATTR: {
                        struct nlattr *nla;
 




^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-19 12:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <1274257089.2766.7.camel@edumazet-laptop>

On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> 
> > Another concern I have is about RPS.
> > 
> > netif_receive_skb() must be called from process_backlog() context, or
> > there is no guarantee the IPI will be sent if this skb is enqueued for
> > another cpu.
> 
> Hmm, I just checked again, and this is wrong.
> 
> In case we enqueue skb on a remote cpu backlog, we also
> do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> later.
> 
But if this happens, then we loose the connection between the packet being
received and the process doing the reception, so the network cgroup classifier
breaks again.

Performance gains are still a big advantage here of course.
Neil


^ permalink raw reply

* [PATCH net-next-2.6] bonding: remove unused original_flags struct slave member
From: Jiri Pirko @ 2010-05-19 11:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, bonding-devel

This is stored but never restored. So remove this as it is useless.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c |    5 -----
 drivers/net/bonding/bonding.h   |    1 -
 2 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4e7473e..ef60244 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1535,11 +1535,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		goto err_undo_flags;
 	}
 
-	/* save slave's original flags before calling
-	 * netdev_set_master and dev_open
-	 */
-	new_slave->original_flags = slave_dev->flags;
-
 	/* Save slave's original mtu and then set it to match the bond */
 	new_slave->original_mtu = slave_dev->mtu;
 	res = dev_set_mtu(slave_dev, bond->dev->mtu);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 2aa3367..da80964 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -159,7 +159,6 @@ struct slave {
 	s8     link;    /* one of BOND_LINK_XXXX */
 	s8     new_link;
 	s8     state;   /* one of BOND_STATE_XXXX */
-	u32    original_flags;
 	u32    original_mtu;
 	u32    link_failure_count;
 	u8     perm_hwaddr[ETH_ALEN];
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH net-next-2.6] bonding: move dev_addr cpy to bond_enslave
From: Jiri Pirko @ 2010-05-19 11:14 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, bonding-devel

Move the code that copies slave's mac address in case that's the first slave into
bond_enslave. Ifenslave app does this also but that's not a problem. This is
something that should be done in bond_enslave, and it shound not matter from
where is it called.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c  |    7 +++++++
 drivers/net/bonding/bond_sysfs.c |    8 --------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2c3f9db..4e7473e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1522,6 +1522,13 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		}
 	}
 
+	/* If this is the first slave, then we need to set the master's hardware
+	 * address to be the same as the slave's. */
+	if (bond->slave_cnt == 0)
+		memcpy(bond->dev->dev_addr, slave_dev->dev_addr,
+		       slave_dev->addr_len);
+
+
 	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
 	if (!new_slave) {
 		res = -ENOMEM;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index a4cbaf7..496ac1e 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -250,14 +250,6 @@ static ssize_t bonding_store_slaves(struct device *d,
 	switch (command[0]) {
 	case '+':
 		pr_info("%s: Adding slave %s.\n", bond->dev->name, dev->name);
-
-		/* If this is the first slave, then we need to set
-		   the master's hardware address to be the same as the
-		   slave's. */
-		if (is_zero_ether_addr(bond->dev->dev_addr))
-			memcpy(bond->dev->dev_addr, dev->dev_addr,
-			       dev->addr_len);
-
 		res = bond_enslave(bond->dev, dev);
 		break;
 
-- 
1.6.6.1


^ permalink raw reply related

* Re: [GIT] Networking
From: Ben Hutchings @ 2010-05-19 11:13 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20100518.233752.256882583.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On Tue, 2010-05-18 at 23:37 -0700, David Miller wrote:
[...]
> Some other things that stand out:
> 
> 1) Allow the administrator to reserve port ranges, such that the
>    kernel bind allocation scheme won't use them.  From Amerigo Wang.
> 
> 2) ipv6 address et al. handling converted to use generic kernel lists.
>    From Stephem Hemminger.
> 
> 3) PHY module autoloading support from David Howells
[...]

For the record, that was actually David Woodhouse.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* RE: any change in socket systemcall or packet_mmap regarding multiqueue nic?
From: Jon Zhou @ 2010-05-19 10:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <1274243074.2485.28.camel@edumazet-laptop>



-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Wednesday, May 19, 2010 12:25 PM
To: Jon Zhou
Cc: netdev@vger.kernel.org
Subject: Re: any change in socket systemcall or packet_mmap regarding multiqueue nic?

Le mardi 18 mai 2010 à 19:55 -0700, Jon Zhou a écrit :
> hi
> the multiqueue networking can utilize multi-core to process packets from multiqueue nic,
> but any change in related userspace application part, such as socket system call, packet_mmap? these userspace API can also utilize multicore to process packets from kernel?
> otherwise they have to read data in serialization
> 

Thats a bit general question. Works are in progress.

So far, you can use a new condition in filters to match a given queue
index for incoming packets. A sniffer could setup N different sockets to
receive data from N NIC queues.

jon->is it something like "ioctl(fd,SOL_SOCKET,queue_id...),could you tell the keyword?

For tcp flows, nothing is needed, since all packets of a given flow
should use same queue.

btw,do you think RFS is helpful for this?


However the current tx queue selection is based on sk->sk_hash value, a
linux side computed value, and this differs from the rx queue selection
done by the NIC firmware. So tx packets use a different queue than rx
packets for a given tcp flow. This means this is suboptimal: tcp_ack()
can run on a different cpu than TX completion handler.

TX completion handler touches the cloned skb that TCP used to transmit
buffer. Its freeing touches the dataref atomic counter in packet.

This should be addressed somehow.






^ permalink raw reply

* [PATCH] net/mpc52xx_phy: Various code cleanups
From: Wolfram Sang @ 2010-05-19 10:21 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: netdev, Wolfram Sang, Grant Likely

- don't free bus->irq (obsoleted by ca816d98170942371535b3e862813b0aba9b7d90)
- don't dispose irqs (should be done in of_mdiobus_register())
- use fec-pointer consistently in transfer()
- use resource_size()
- cosmetic fixes

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
---

My phyCORE-MPC5200B-tiny still passes its test-suite after the patch
(running via nfsroot).

 drivers/net/fec_mpc52xx_phy.c |   24 +++++-------------------
 1 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/net/fec_mpc52xx_phy.c b/drivers/net/fec_mpc52xx_phy.c
index 7658a08..35c90e0 100644
--- a/drivers/net/fec_mpc52xx_phy.c
+++ b/drivers/net/fec_mpc52xx_phy.c
@@ -29,15 +29,14 @@ static int mpc52xx_fec_mdio_transfer(struct mii_bus *bus, int phy_id,
 		int reg, u32 value)
 {
 	struct mpc52xx_fec_mdio_priv *priv = bus->priv;
-	struct mpc52xx_fec __iomem *fec;
+	struct mpc52xx_fec __iomem *fec = priv->regs;
 	int tries = 3;
 
 	value |= (phy_id << FEC_MII_DATA_PA_SHIFT) & FEC_MII_DATA_PA_MSK;
 	value |= (reg << FEC_MII_DATA_RA_SHIFT) & FEC_MII_DATA_RA_MSK;
 
-	fec = priv->regs;
 	out_be32(&fec->ievent, FEC_IEVENT_MII);
-	out_be32(&priv->regs->mii_data, value);
+	out_be32(&fec->mii_data, value);
 
 	/* wait for it to finish, this takes about 23 us on lite5200b */
 	while (!(in_be32(&fec->ievent) & FEC_IEVENT_MII) && --tries)
@@ -47,7 +46,7 @@ static int mpc52xx_fec_mdio_transfer(struct mii_bus *bus, int phy_id,
 		return -ETIMEDOUT;
 
 	return value & FEC_MII_DATA_OP_RD ?
-		in_be32(&priv->regs->mii_data) & FEC_MII_DATA_DATAMSK : 0;
+		in_be32(&fec->mii_data) & FEC_MII_DATA_DATAMSK : 0;
 }
 
 static int mpc52xx_fec_mdio_read(struct mii_bus *bus, int phy_id, int reg)
@@ -69,9 +68,8 @@ static int mpc52xx_fec_mdio_probe(struct of_device *of,
 	struct device_node *np = of->node;
 	struct mii_bus *bus;
 	struct mpc52xx_fec_mdio_priv *priv;
-	struct resource res = {};
+	struct resource res;
 	int err;
-	int i;
 
 	bus = mdiobus_alloc();
 	if (bus == NULL)
@@ -93,7 +91,7 @@ static int mpc52xx_fec_mdio_probe(struct of_device *of,
 	err = of_address_to_resource(np, 0, &res);
 	if (err)
 		goto out_free;
-	priv->regs = ioremap(res.start, res.end - res.start + 1);
+	priv->regs = ioremap(res.start, resource_size(&res));
 	if (priv->regs == NULL) {
 		err = -ENOMEM;
 		goto out_free;
@@ -118,10 +116,6 @@ static int mpc52xx_fec_mdio_probe(struct of_device *of,
  out_unmap:
 	iounmap(priv->regs);
  out_free:
-	for (i=0; i<PHY_MAX_ADDR; i++)
-		if (bus->irq[i] != PHY_POLL)
-			irq_dispose_mapping(bus->irq[i]);
-	kfree(bus->irq);
 	kfree(priv);
 	mdiobus_free(bus);
 
@@ -133,23 +127,16 @@ static int mpc52xx_fec_mdio_remove(struct of_device *of)
 	struct device *dev = &of->dev;
 	struct mii_bus *bus = dev_get_drvdata(dev);
 	struct mpc52xx_fec_mdio_priv *priv = bus->priv;
-	int i;
 
 	mdiobus_unregister(bus);
 	dev_set_drvdata(dev, NULL);
-
 	iounmap(priv->regs);
-	for (i=0; i<PHY_MAX_ADDR; i++)
-		if (bus->irq[i] != PHY_POLL)
-			irq_dispose_mapping(bus->irq[i]);
 	kfree(priv);
-	kfree(bus->irq);
 	mdiobus_free(bus);
 
 	return 0;
 }
 
-
 static struct of_device_id mpc52xx_fec_mdio_match[] = {
 	{ .compatible = "fsl,mpc5200b-mdio", },
 	{ .compatible = "fsl,mpc5200-mdio", },
@@ -168,5 +155,4 @@ struct of_platform_driver mpc52xx_fec_mdio_driver = {
 /* let fec driver call it, since this has to be registered before it */
 EXPORT_SYMBOL_GPL(mpc52xx_fec_mdio_driver);
 
-
 MODULE_LICENSE("Dual BSD/GPL");
-- 
1.7.0


^ permalink raw reply related

* Re: [PATCH 06/11] netdev: bfin_mac: avoid tx skb overflows in the tx DMA ring
From: Sonic Zhang @ 2010-05-19  9:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100518.122953.236217958.davem@davemloft.net>

On Wed, May 19, 2010 at 3:29 AM, David Miller <davem@davemloft.net> wrote:
> From: Sonic Zhang <sonic.adi@gmail.com>
> Date: Tue, 18 May 2010 18:52:17 +0800
>
> [ Please never drop the mailing list when you're trying to
>  discuss something networking related with me, thanks.
>  I've put the CC: back. ]
>
>> On Mon, May 10, 2010 at 7:40 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Mike Frysinger <vapier@gentoo.org>
>>> Date: Sun,  9 May 2010 06:18:52 -0400
>>>
>>>> From: Sonic Zhang <sonic.zhang@analog.com>
>>>>
>>>> Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
>>>> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
>>>
>>> This should never ever happen, it's a bug and you should print a warning
>>> message when and if it does actually occur.
>>>
>>> At any point where your ->next pointer hits tx_list_head, the queue
>>> should have been stopped by your driver and therefore the networking
>>> core will never pass another packet to you.
>>
>> To reduce the tx output overhead, the tx interrupt is not enabled and
>> handled in this driver. So, the driver doesn't know when to restart
>> the tx queue if the queue is stopped in ndo_start_xmit() at the point
>> where next pointer hits tx_list_head.
>>
>> In this case, although TX buffer list full rarely happens, the check
>> is still a safeguard.
>
> You can't do that, if you don't use the TX interrupt then you can leave
> SKBs stale in your TX ring for indefinite periods of time which is
> illegal.

No, this doesn't happen, because before ndo_start_xmit() returns, the
old TX buffers and skbs in the ring, which finished DMA operation, are
freed. The only difference is that the free operation of a skb is done
in next tx transfer.

Sonic

>
> SKBs hold onto resources that can't be held indefinitely, such as TCP
> socket references and netfilter conntrack state.  So if you leave a
> packet in your TX ring for a long time, there might be a TCP socket
> that now cannot be closed and freed up because of that.
>
> You must therefore free them very as soon as possible after the
> hardware is done with them.
>

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-19  8:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Thomas Graf, Neil Horman, netdev
In-Reply-To: <1274257637.2766.11.camel@edumazet-laptop>

On Wed, May 19, 2010 at 10:27:17AM +0200, Eric Dumazet wrote:
> 
> This is a bit wrong, at least here (CONFIG_4KSTACKS=y)
> 
> Some people still use 32bits these days ;)
> 
> Please check arch/x86/kernel/irq_32.c

Right, I was looking at the generic version.

Still, as I'm only changing tun.c where we know the call comes
from a syscall, I don't think the stack is a great issue.

Besides, for TX we already perform everything from the same stack
depth and the TX path isn't that much shallower compared to the
RX path.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-19  8:27 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Thomas Graf, Neil Horman, netdev
In-Reply-To: <20100519082047.GA24331@gondor.apana.org.au>

Le mercredi 19 mai 2010 à 18:20 +1000, Herbert Xu a écrit :
> On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote:
> > 
> > 6) netif_rx() pro is that packet processing is done while stack usage is
> > guaranteed to be low (from process_backlog, using a special softirq
> > stack, instead of current stack)
> > 
> > After your patch, tun will use more stack. Is it safe on all contexts ?
> 
> Dave also raised this but I believe nothing changes with regards
> to the stack.  We currently call do_softirq which does not switch
> stacks.
> 
> Only a real interrupt would switch stacks.

This is a bit wrong, at least here (CONFIG_4KSTACKS=y)

Some people still use 32bits these days ;)

Please check arch/x86/kernel/irq_32.c

asmlinkage void do_softirq(void)
{
        unsigned long flags;
        struct thread_info *curctx;
        union irq_ctx *irqctx;
        u32 *isp;

        if (in_interrupt())
                return;

        local_irq_save(flags);

        if (local_softirq_pending()) {
                curctx = current_thread_info();
                irqctx = __get_cpu_var(softirq_ctx);
                irqctx->tinfo.task = curctx->task;
                irqctx->tinfo.previous_esp = current_stack_pointer;

                /* build the stack frame on the softirq stack */
                isp = (u32 *) ((char *)irqctx + sizeof(*irqctx));

                call_on_stack(__do_softirq, isp);
                /*
                 * Shouldnt happen, we returned above if in_interrupt():
                 */
                WARN_ON_ONCE(softirq_count());
        }

        local_irq_restore(flags);
}




^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-19  8:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Thomas Graf, Neil Horman, netdev
In-Reply-To: <1274257089.2766.7.camel@edumazet-laptop>

On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> 
> > Another concern I have is about RPS.
> > 
> > netif_receive_skb() must be called from process_backlog() context, or
> > there is no guarantee the IPI will be sent if this skb is enqueued for
> > another cpu.
> 
> Hmm, I just checked again, and this is wrong.
> 
> In case we enqueue skb on a remote cpu backlog, we also
> do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> later.

OK your concern is only with the stack usage, right?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Herbert Xu @ 2010-05-19  8:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Thomas Graf, Neil Horman, netdev
In-Reply-To: <1274256582.2766.5.camel@edumazet-laptop>

On Wed, May 19, 2010 at 10:09:42AM +0200, Eric Dumazet wrote:
> 
> 6) netif_rx() pro is that packet processing is done while stack usage is
> guaranteed to be low (from process_backlog, using a special softirq
> stack, instead of current stack)
> 
> After your patch, tun will use more stack. Is it safe on all contexts ?

Dave also raised this but I believe nothing changes with regards
to the stack.  We currently call do_softirq which does not switch
stacks.

Only a real interrupt would switch stacks.

> Another concern I have is about RPS.
> 
> netif_receive_skb() must be called from process_backlog() context, or
> there is no guarantee the IPI will be sent if this skb is enqueued for
> another cpu.

Can you explain this a bit more?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-19  8:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Thomas Graf, Neil Horman, netdev
In-Reply-To: <1274256582.2766.5.camel@edumazet-laptop>

Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :

> Another concern I have is about RPS.
> 
> netif_receive_skb() must be called from process_backlog() context, or
> there is no guarantee the IPI will be sent if this skb is enqueued for
> another cpu.

Hmm, I just checked again, and this is wrong.

In case we enqueue skb on a remote cpu backlog, we also
do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
later.




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox