Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: any change in socket systemcall or packet_mmap regarding multiqueue nic?
From: Jon Zhou @ 2010-05-19 10:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <1274243074.2485.28.camel@edumazet-laptop>

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Wednesday, May 19, 2010 12:25 PM
To: Jon Zhou
Cc: netdev@vger.kernel.org
Subject: Re: any change in socket systemcall or packet_mmap regarding multiqueue nic?

Le mardi 18 mai 2010 à 19:55 -0700, Jon Zhou a écrit :
> hi
> the multiqueue networking can utilize multi-core to process packets from multiqueue nic,
> but any change in related userspace application part, such as socket system call, packet_mmap? these userspace API can also utilize multicore to process packets from kernel?
> otherwise they have to read data in serialization
> 

Thats a bit general question. Works are in progress.

So far, you can use a new condition in filters to match a given queue
index for incoming packets. A sniffer could setup N different sockets to
receive data from N NIC queues.

jon->is it something like "ioctl(fd,SOL_SOCKET,queue_id...),could you tell the keyword?

For tcp flows, nothing is needed, since all packets of a given flow
should use same queue.

btw,do you think RFS is helpful for this?

However the current tx queue selection is based on sk->sk_hash value, a
linux side computed value, and this differs from the rx queue selection
done by the NIC firmware. So tx packets use a different queue than rx
packets for a given tcp flow. This means this is suboptimal: tcp_ack()
can run on a different cpu than TX completion handler.

TX completion handler touches the cloned skb that TCP used to transmit
buffer. Its freeing touches the dataref atomic counter in packet.

This should be addressed somehow.

^ permalink raw reply

* Re: [GIT] Networking
From: Ben Hutchings @ 2010-05-19 11:13 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20100518.233752.256882583.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On Tue, 2010-05-18 at 23:37 -0700, David Miller wrote:
[...]
> Some other things that stand out:
> 
> 1) Allow the administrator to reserve port ranges, such that the
>    kernel bind allocation scheme won't use them.  From Amerigo Wang.
> 
> 2) ipv6 address et al. handling converted to use generic kernel lists.
>    From Stephem Hemminger.
> 
> 3) PHY module autoloading support from David Howells
[...]

For the record, that was actually David Woodhouse.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [PATCH net-next-2.6] bonding: move dev_addr cpy to bond_enslave
From: Jiri Pirko @ 2010-05-19 11:14 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, bonding-devel

Move the code that copies slave's mac address in case that's the first slave into
bond_enslave. Ifenslave app does this also but that's not a problem. This is
something that should be done in bond_enslave, and it shound not matter from
where is it called.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c  |    7 +++++++
 drivers/net/bonding/bond_sysfs.c |    8 --------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2c3f9db..4e7473e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1522,6 +1522,13 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		}
 	}
 
+	/* If this is the first slave, then we need to set the master's hardware
+	 * address to be the same as the slave's. */
+	if (bond->slave_cnt == 0)
+		memcpy(bond->dev->dev_addr, slave_dev->dev_addr,
+		       slave_dev->addr_len);
+
+
 	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
 	if (!new_slave) {
 		res = -ENOMEM;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index a4cbaf7..496ac1e 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -250,14 +250,6 @@ static ssize_t bonding_store_slaves(struct device *d,
 	switch (command[0]) {
 	case '+':
 		pr_info("%s: Adding slave %s.\n", bond->dev->name, dev->name);
-
-		/* If this is the first slave, then we need to set
-		   the master's hardware address to be the same as the
-		   slave's. */
-		if (is_zero_ether_addr(bond->dev->dev_addr))
-			memcpy(bond->dev->dev_addr, dev->dev_addr,
-			       dev->addr_len);
-
 		res = bond_enslave(bond->dev, dev);
 		break;
 
-- 
1.6.6.1


^ permalink raw reply related

* [PATCH net-next-2.6] bonding: remove unused original_flags struct slave member
From: Jiri Pirko @ 2010-05-19 11:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, bonding-devel

This is stored but never restored. So remove this as it is useless.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c |    5 -----
 drivers/net/bonding/bonding.h   |    1 -
 2 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4e7473e..ef60244 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1535,11 +1535,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		goto err_undo_flags;
 	}
 
-	/* save slave's original flags before calling
-	 * netdev_set_master and dev_open
-	 */
-	new_slave->original_flags = slave_dev->flags;
-
 	/* Save slave's original mtu and then set it to match the bond */
 	new_slave->original_mtu = slave_dev->mtu;
 	res = dev_set_mtu(slave_dev, bond->dev->mtu);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 2aa3367..da80964 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -159,7 +159,6 @@ struct slave {
 	s8     link;    /* one of BOND_LINK_XXXX */
 	s8     new_link;
 	s8     state;   /* one of BOND_STATE_XXXX */
-	u32    original_flags;
 	u32    original_mtu;
 	u32    link_failure_count;
 	u8     perm_hwaddr[ETH_ALEN];
-- 
1.6.6.1


^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-19 12:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <1274257089.2766.7.camel@edumazet-laptop>

On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> 
> > Another concern I have is about RPS.
> > 
> > netif_receive_skb() must be called from process_backlog() context, or
> > there is no guarantee the IPI will be sent if this skb is enqueued for
> > another cpu.
> 
> Hmm, I just checked again, and this is wrong.
> 
> In case we enqueue skb on a remote cpu backlog, we also
> do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> later.
> 
But if this happens, then we loose the connection between the packet being
received and the process doing the reception, so the network cgroup classifier
breaks again.

Performance gains are still a big advantage here of course.
Neil


^ permalink raw reply

* RE: any change in socket systemcall or packet_mmap regarding multiqueue nic?
From: Eric Dumazet @ 2010-05-19 12:28 UTC (permalink / raw)
  To: Jon Zhou; +Cc: netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497EFC946@MILEXCH2.ds.jdsu.net>

Le mercredi 19 mai 2010 à 03:36 -0700, Jon Zhou a écrit :
> 
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Sent: Wednesday, May 19, 2010 12:25 PM
> To: Jon Zhou
> Cc: netdev@vger.kernel.org
> Subject: Re: any change in socket systemcall or packet_mmap regarding multiqueue nic?
> 
> Le mardi 18 mai 2010 à 19:55 -0700, Jon Zhou a écrit :
> > hi
> > the multiqueue networking can utilize multi-core to process packets from multiqueue nic,
> > but any change in related userspace application part, such as socket system call, packet_mmap? these userspace API can also utilize multicore to process packets from kernel?
> > otherwise they have to read data in serialization
> > 
> 
> Thats a bit general question. Works are in progress.
> 
> So far, you can use a new condition in filters to match a given queue
> index for incoming packets. A sniffer could setup N different sockets to
> receive data from N NIC queues.
> 
> jon->is it something like "ioctl(fd,SOL_SOCKET,queue_id...),could you tell the keyword?

keyword is BPF (used in libpcap) and SKF_AD_QUEUE instruction

Kernel part is ready :

commit d19742fb1c68e6db83b76e06dea5a374c99e104f
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Tue Oct 20 01:06:22 2009 -0700

    filter: Add SKF_AD_QUEUE instruction
    
    It can help being able to filter packets on their queue_mapping.
    
    If filter performance is not good, we could add a "numqueue" field
    in struct packet_type, so that netif_nit_deliver() and other
functions
    can directly ignore packets with not expected queue number.
    
    Lets experiment this simple filter extension first.
    
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 909193e..bb3b435 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -124,7 +124,8 @@ struct sock_fprog   /* Required for
SO_ATTACH_FILTER. */
 #define SKF_AD_NLATTR  12
 #define SKF_AD_NLATTR_NEST     16
 #define SKF_AD_MARK    20
-#define SKF_AD_MAX     24
+#define SKF_AD_QUEUE   24
+#define SKF_AD_MAX     28
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index e3987e1..08db7b9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -306,6 +306,9 @@ load_b:
                case SKF_AD_MARK:
                        A = skb->mark;
                        continue;
+               case SKF_AD_QUEUE:
+                       A = skb->queue_mapping;
+                       continue;
                case SKF_AD_NLATTR: {
                        struct nlattr *nla;
 




^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-19 12:55 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <20100519120547.GB26584@hmsreliant.think-freely.org>

On Wed, May 19, 2010 at 08:05:47AM -0400, Neil Horman wrote:
> On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > 
> > > Another concern I have is about RPS.
> > > 
> > > netif_receive_skb() must be called from process_backlog() context, or
> > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > another cpu.
> > 
> > Hmm, I just checked again, and this is wrong.
> > 
> > In case we enqueue skb on a remote cpu backlog, we also
> > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > later.
> > 
> But if this happens, then we loose the connection between the packet being
> received and the process doing the reception, so the network cgroup classifier
> breaks again.
> 
> Performance gains are still a big advantage here of course.
> Neil
> 
Scratch what I said here, Herbert corrected me on this, and we're ok, as tun has
no rps map.

I'll test this patch out in just a bit
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [GIT] Networking
From: John W. Linville @ 2010-05-19 13:09 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, torvalds, akpm, netdev, linux-kernel
In-Reply-To: <1274267582.2763.24.camel@localhost>

On Wed, May 19, 2010 at 12:13:02PM +0100, Ben Hutchings wrote:
> On Tue, 2010-05-18 at 23:37 -0700, David Miller wrote:

> > 3) PHY module autoloading support from David Howells

> For the record, that was actually David Woodhouse.

All Davids are the same... :-)

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* [PATCH net-next-2.6] bonding: optimize tlb_get_least_loaded_slave
From: Jiri Pirko @ 2010-05-19 13:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, fubar, bonding-devel

In the worst case, when the first loop breaks an the end of the slave list,
the slave list is iterated through twice. This patch reduces this
function only to one loop. Also makes it simpler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_alb.c |   33 +++++++++++++--------------------
 1 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 40fdc41..25c14c6 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -233,34 +233,27 @@ static void tlb_deinitialize(struct bonding *bond)
 	_unlock_tx_hashtbl(bond);
 }
 
+static long long compute_gap(struct slave *slave)
+{
+	return (s64) (slave->speed << 20) - /* Convert to Megabit per sec */
+	       (s64) (SLAVE_TLB_INFO(slave).load << 3); /* Bytes to bits */
+}
+
 /* Caller must hold bond lock for read */
 static struct slave *tlb_get_least_loaded_slave(struct bonding *bond)
 {
 	struct slave *slave, *least_loaded;
-	s64 max_gap;
-	int i, found = 0;
-
-	/* Find the first enabled slave */
-	bond_for_each_slave(bond, slave, i) {
-		if (SLAVE_IS_OK(slave)) {
-			found = 1;
-			break;
-		}
-	}
-
-	if (!found) {
-		return NULL;
-	}
+	long long max_gap;
+	int i;
 
-	least_loaded = slave;
-	max_gap = (s64)(slave->speed << 20) - /* Convert to Megabit per sec */
-			(s64)(SLAVE_TLB_INFO(slave).load << 3); /* Bytes to bits */
+	least_loaded = NULL;
+	max_gap = LLONG_MIN;
 
 	/* Find the slave with the largest gap */
-	bond_for_each_slave_from(bond, slave, i, least_loaded) {
+	bond_for_each_slave(bond, slave, i) {
 		if (SLAVE_IS_OK(slave)) {
-			s64 gap = (s64)(slave->speed << 20) -
-					(s64)(SLAVE_TLB_INFO(slave).load << 3);
+			long long gap = compute_gap(slave);
+
 			if (max_gap < gap) {
 				least_loaded = slave;
 				max_gap = gap;
-- 
1.6.6.1


^ permalink raw reply related

* Re: r8169 transmit queue time outs
From: Kyle McMartin @ 2010-05-19 13:43 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Kyle McMartin, netdev, dgilmore
In-Reply-To: <20100506201024.GA3541@electric-eye.fr.zoreil.com>

On Thu, May 06, 2010 at 10:10:24PM +0200, Francois Romieu wrote:
> Kyle McMartin <kmcmartin@redhat.com> :
> [...]
> > Some of our users have been seeing their r8169 cards just up and stop
> > transmitting packets pretty quickly after boot with recent kernels.
> [...]
> > Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.i686.PAE #1
> 
> Can they upgrade to 2.6.32.11-99.fc12.i686 and try an out-of-tree build
> of the driver at http://userweb.kernel.org/~romieu/r8169/2.6.32.11-99.fc12/ ?
> 
> It should be quite close to the current git kernel.
> 

I provided a bunch of testers with a backport of the current git head
r8169 driver, and sadly, they report the TX timeout issues still occur.
:/

Any other ideas?

regards, Kyle

^ permalink raw reply

* Re: r8169 transmit queue time outs
From: Kyle McMartin @ 2010-05-19 13:48 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: Francois Romieu, netdev, dgilmore
In-Reply-To: <20100519134344.GI3900@ihatethathostname.lab.bos.redhat.com>

On Wed, May 19, 2010 at 09:43:44AM -0400, Kyle McMartin wrote:
> I provided a bunch of testers with a backport of the current git head
> r8169 driver, and sadly, they report the TX timeout issues still occur.
> :/
> 
> Any other ideas?
> 

They note the vendor driver seems to work without these transmit
timeouts... Would it be worthwhile for me to take a look and compare the
tx setup between these git head and the vendor latest driver?

regards, Kyle

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2010-05-19 14:10 UTC (permalink / raw)
  To: Neil Horman; +Cc: Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <20100519120547.GB26584@hmsreliant.think-freely.org>

Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit :
> On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > 
> > > Another concern I have is about RPS.
> > > 
> > > netif_receive_skb() must be called from process_backlog() context, or
> > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > another cpu.
> > 
> > Hmm, I just checked again, and this is wrong.
> > 
> > In case we enqueue skb on a remote cpu backlog, we also
> > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > later.
> > 
> But if this happens, then we loose the connection between the packet being
> received and the process doing the reception, so the network cgroup classifier
> breaks again.
> 
> Performance gains are still a big advantage here of course.

RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled
by default, so if cgroup classifier is needed, it should work as is.

Speaking of net/sched/cls_cgroup.c, I am contemplating following
sequence :

rcu_read_lock();
classid = task_cls_state(current)->classid;
rcu_read_unlock();

RCU is definitly so special (should I say magic ?), that we use it even
if not needed. It makes us happy...

^ permalink raw reply

* Re: r8169 transmit queue time outs
From: Eric Dumazet @ 2010-05-19 14:18 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: Francois Romieu, netdev, dgilmore
In-Reply-To: <20100519134344.GI3900@ihatethathostname.lab.bos.redhat.com>

Le mercredi 19 mai 2010 à 09:43 -0400, Kyle McMartin a écrit :
> On Thu, May 06, 2010 at 10:10:24PM +0200, Francois Romieu wrote:
> > Kyle McMartin <kmcmartin@redhat.com> :
> > [...]
> > > Some of our users have been seeing their r8169 cards just up and stop
> > > transmitting packets pretty quickly after boot with recent kernels.
> > [...]
> > > Pid: 0, comm: swapper Not tainted 2.6.31.5-127.fc12.i686.PAE #1
> > 
> > Can they upgrade to 2.6.32.11-99.fc12.i686 and try an out-of-tree build
> > of the driver at http://userweb.kernel.org/~romieu/r8169/2.6.32.11-99.fc12/ ?
> > 
> > It should be quite close to the current git kernel.
> > 
> 
> I provided a bunch of testers with a backport of the current git head
> r8169 driver, and sadly, they report the TX timeout issues still occur.
> :/
> 
> Any other ideas?

Scratch the NIC ?

Normally not related, but I mentioned once following patch that could be
tried. (Not reset the NIC if we receive too many frames in a row)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..c4dbb15 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4520,10 +4520,8 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 				dev->stats.rx_length_errors++;
 			if (status & RxCRC)
 				dev->stats.rx_crc_errors++;
-			if (status & RxFOVF) {
-				rtl8169_schedule_work(dev, rtl8169_reset_task);
+			if (status & RxFOVF)
 				dev->stats.rx_fifo_errors++;
-			}
 			rtl8169_mark_to_asic(desc, tp->rx_buf_sz);
 		} else {
 			struct sk_buff *skb = tp->Rx_skbuff[entry];



^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-19 14:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neil Horman, Herbert Xu, David S. Miller, Thomas Graf, netdev
In-Reply-To: <1274278229.2766.112.camel@edumazet-laptop>

On Wed, May 19, 2010 at 04:10:29PM +0200, Eric Dumazet wrote:
> Le mercredi 19 mai 2010 à 08:05 -0400, Neil Horman a écrit :
> > On Wed, May 19, 2010 at 10:18:09AM +0200, Eric Dumazet wrote:
> > > Le mercredi 19 mai 2010 à 10:09 +0200, Eric Dumazet a écrit :
> > > 
> > > > Another concern I have is about RPS.
> > > > 
> > > > netif_receive_skb() must be called from process_backlog() context, or
> > > > there is no guarantee the IPI will be sent if this skb is enqueued for
> > > > another cpu.
> > > 
> > > Hmm, I just checked again, and this is wrong.
> > > 
> > > In case we enqueue skb on a remote cpu backlog, we also
> > > do __raise_softirq_irqoff(NET_RX_SOFTIRQ); so the IPI will be done
> > > later.
> > > 
> > But if this happens, then we loose the connection between the packet being
> > received and the process doing the reception, so the network cgroup classifier
> > breaks again.
> > 
> > Performance gains are still a big advantage here of course.
> 
> RPS is enabled on a per device (or more precisely per subqueue) basis, and disabled
> by default, so if cgroup classifier is needed, it should work as is.
> 
> Speaking of net/sched/cls_cgroup.c, I am contemplating following
> sequence :
> 
> rcu_read_lock();
> classid = task_cls_state(current)->classid;
> rcu_read_unlock();
> 
Yeah, I'd noticed there was no write side to that, but hadn't quite gotten to
investigating  that further :)

> RCU is definitly so special (should I say magic ?), that we use it even
> if not needed. It makes us happy...
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* sky2 poweroff screws up my network
From: Kyle McMartin @ 2010-05-19 14:32 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

Hi Stephen,

I've noticed a rather strange problem with the onboard sky2 on one of my
machines. When I halt -p it, my router stops passing traffic completely
until I kill the power to the machine entirely. If it's just been shut
down with halt -p, it recovers immediately after turning it back on,
while it's in BIOS.

Any thoughts on this? My guess is its because the sky2 shutdown puts the
nic to sleep and when the host is off, bad things happen.

regards, Kyle

^ permalink raw reply

* Re: sky2 poweroff screws up my network
From: Stephen Hemminger @ 2010-05-19 15:14 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: netdev
In-Reply-To: <20100519143208.GL3900@ihatethathostname.lab.bos.redhat.com>

On Wed, 19 May 2010 10:32:08 -0400
Kyle McMartin <kmcmartin@redhat.com> wrote:

> Hi Stephen,
> 
> I've noticed a rather strange problem with the onboard sky2 on one of my
> machines. When I halt -p it, my router stops passing traffic completely
> until I kill the power to the machine entirely. If it's just been shut
> down with halt -p, it recovers immediately after turning it back on,
> while it's in BIOS.
> 
> Any thoughts on this? My guess is its because the sky2 shutdown puts the
> nic to sleep and when the host is off, bad things happen.
> 
> regards, Kyle

The sky2 shutdown puts the chip in Wake On Lan state; this
does a separate link speed negotiation (100 mbit) which may be a problem
if speed duplex is forced.

^ permalink raw reply

* Re: [PATCH 3/4] fec: add support for Freescale i.MX25 PDK (3DS)
From: Jean-Christophe Dubois @ 2010-05-19 15:15 UTC (permalink / raw)
  To: netdev, linux-arm-kernel; +Cc: Baruch Siach, Sascha Hauer, Greg Ungerer
In-Reply-To: <20100125112116.GF6724@jasper.tkos.co.il>

le lundi 25 janvier 2010 Baruch Siach a écrit
> Hi Greg, netdev,
> 
> On Wed, Dec 16, 2009 at 08:34:06AM +0200, Baruch Siach wrote:
> > On Wed, Dec 16, 2009 at 10:13:56AM +1000, Greg Ungerer wrote:
> > > Baruch Siach wrote:
> > > >On Tue, Dec 15, 2009 at 09:52:24PM +1000, Greg Ungerer wrote:
> > > >>On 12/15/2009 06:31 PM, Baruch Siach wrote:
> > > >>>+#ifndef CONFIG_M5272
> > > >>
> > > >>I would suggest making this conditional on FEC_MIIGSK_ENR.
> > > >>Although the CONFIG_M5272 is the only case here currently,
> > > >>that may change over the years. And using this here may not
> > > >>be obvious to the causual code reader, since the register
> > > >>offset definitions don't explicitly key on CONFIG_M5272.
> > > >
> > > >OK, I'll change this conditional.
> > > >
> > > >Can I take this as an Ack from you?
> > > 
> > > With that conditional check changed, sure:
> > > 
> > > Acked-by:  Greg Ungerer <gerg@uclinux.org>
> > 
> > Thanks. The updated patch below.
> 
> I'm really sorry to bug on this again, but since the platform code is
> already upstream the i.MX25 code doesn't build without this patch
> (include/linux/fec.h missing). So, someone please pick up this patch,
> preferably prior to .33.
> 
> baruch

I am just wondering if somebody is going to pick up this patch 
(http://patchwork.ozlabs.org/patch/41235/) so that it finds its way on 
mainline.

The i.MX25 PDK platform (3DStack) has been broken for 2 kernel releases (as 
the actual code depend on this). Is there hope that this patch is picked up by 
somebody in order to get merged to main line someday?

Or is it not acceptable in this state for some reason?

Either the FEC changes need to be merged (possibly with some modification if 
there are issues) or the part depending on it in the ARM tree need to be 
reverted. We should not stay in this non coherent state.

JC

^ permalink raw reply

* how many msi (msi-x) vectors can be setup?
From: zhou rui @ 2010-05-19 15:18 UTC (permalink / raw)
  To: netdev

hi there:
how many msi (msi-x) vectors can be setup?
the number is limited by hardware resource(nic), or kernel ?
I found that the driver (broadcom 57711 ver 1.5.12) tried to request
16 queues on my kernel2.6.27,but only 2  available
will it be increased if I update the driver or kernel?
and there is a limitiation in the system? if the other devices have
already occupied too many MSI vectors then it is not enough.

thanks
rui

^ permalink raw reply

* Re: [PATCH iproute2] document initcwnd
From: Stephen Hemminger @ 2010-05-19 15:31 UTC (permalink / raw)
  To: Brian Bloniarz; +Cc: dormando, netdev, Rick Jones, shemminger
In-Reply-To: <4BE21D64.4040600@athenacr.com>

On Wed, 05 May 2010 21:37:40 -0400
Brian Bloniarz <bmb@athenacr.com> wrote:

> Stephen Hemminger wrote:
> > On Wed, 05 May 2010 16:56:34 -0400
> > Brian Bloniarz <bmb@athenacr.com> wrote:
> > 
> >> dormando wrote:
> >>>> This sounds like TCP slow start.
> >>>>
> >>>> http://en.wikipedia.org/wiki/Slow-start
> >>>>
> >>>> As far as tunables you might want to play with the initcwnd route
> >>>> flag (see "ip route help")
> >>> Ah, yes, initcwnd was it. I'm well aware of TCP Congestion control / slow
> >>> start / etc. However I couldn't find the damn tunable for it :)
> >> Documenting the flag in ip(8) might increase its visibility
> >> a little. I don't see it documented in the iproute2 git head,
> >> though it shows up on http://linux.die.net/man/8/ip somehow.
> >>
> >> Stephen, do you know why that is?
> > 
> > No one sent me an official patch to change it?
> 
> Mention initcwnd in ip(8). Text taken from doc/ip-cref.tex.

Applied

^ permalink raw reply

* Re: sky2 poweroff screws up my network
From: Kyle McMartin @ 2010-05-19 15:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Kyle McMartin, netdev
In-Reply-To: <20100519081440.2883e71d@nehalam>

On Wed, May 19, 2010 at 08:14:40AM -0700, Stephen Hemminger wrote:
> The sky2 shutdown puts the chip in Wake On Lan state; this
> does a separate link speed negotiation (100 mbit) which may be a problem
> if speed duplex is forced.
>

Hrm, I'd disabled WoL in ethtool explicitly hoping this sort of thing
could be avioded, but it doesn't seem to help. :/

I'm not sure what you mean by "if speed duplex is forced," I've left the
nic in autoneg mode.

regards, Kyle

^ permalink raw reply

* [ANNOUNCE] iproute2 2.6.34
From: Stephen Hemminger @ 2010-05-19 15:54 UTC (permalink / raw)
  To: netdev; +Cc: linux-net, linux-kernel

This version of iproute2 utilities intended for use with 2.6.34 or
later kernel, but should be backward compatible with older releases.
In addition to build and man page fixes, this release includes a
support for several new features:

   * SR-IOV (I/O Virtualization) support.
   * tuntap support
   * bus-error reporting and counters
   * new FIFO type head drop queue discipline

The tar ball is available at:
  http://devresources.linuxfoundation.org/dev/iproute2/download

Repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

For more info on iproute2 see:
  http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2

Report problems (or enhancements) to the netdev@vger.kernel.org mailing list.

Changes since last release (2.6.33)

Alexandre Cassen (1):
      Detect 6rd kernel missing support / 6rd tunnel scope

Andreas Henriksson (2):
      iproute2: detect iptables modules dir in configure.
      iproute2: add option to build m_xt as a tc module (v3)

Bart Trojanowski (1):
      fix build issues with flex ver 2.5

Brian Bloniarz (1):
      ip: document initcwnd

Chris Wright (1):
      iproute2: rework SR-IOV VF support

David Woodhouse (1):
      Add 'ip tuntap' support.

Florian Westphal (1):
      iproute2: fix addrlabel interface names handling

Hagen Paul Pfeifer (1):
      tc: add new queue discipline: head drop fifo

Jamal Hadi Salim (3):
      xfrm: policy by mark
      xfrm: Introduce xfrm by mark
      xfrm: add support for SA by mark

Jan Engelhardt (1):
      ip: correctly report tunnel link type

Michele Petrazzo - Unipex (1):
      Continue after errors in -batch

Williams, Mitch A (3):
      Update man page to indicate current options
      ip: Add support for setting and showing SR-IOV virtual funtion link params
      libnetlink: Modify the parser to track first duplicated attributes

Wolfgang Grandegger (1):
      iproute2: netlink support for bus-error reporting and counters

YOSHIFUJI Hideaki / 吉藤英明 (1):
      gaiconf: /etc/gai.conf configuration helper.

jamal (1):
      skbedit: use get_u32 for parsing mark

laurent chavey (1):
      Add initcwnd to iproute2

Stephen Hemminger:
      Fix line numbering on batch commands
      Remove mirred debug message
      Workaround missing ALIGN() macro.
      Update ip.8 man page to describe route table id values
      Update kernel headers to 2.6.34 final version
      Add documentation for ip link add/delete sub-commands
      ip: add documentation for initrwnd
      v2.6.34

^ permalink raw reply

* [PATCH] drivers/net/arcnet/capmode.c: clean up code
From: Daniel Mack @ 2010-05-19 16:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Daniel Mack, Tejun Heo, Jiri Kosina, Christoph Lameter,
	Joe Perches, netdev

 - shuffle around functions to get rid of forward declarations
 - fix some CodingStyle and indentation issues
 - last but not least, get rid of the following CONFIG_MODULE=n warning:

	drivers/net/arcnet/capmode.c:52: warning: ‘capmode_proto’ defined but not used

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: netdev@vger.kernel.org
---
 drivers/net/arcnet/capmode.c |  176 +++++++++++++++++++-----------------------
 1 files changed, 79 insertions(+), 97 deletions(-)

diff --git a/drivers/net/arcnet/capmode.c b/drivers/net/arcnet/capmode.c
index 20e833a..e1810a3 100644
--- a/drivers/net/arcnet/capmode.c
+++ b/drivers/net/arcnet/capmode.c
@@ -37,67 +37,6 @@
 
 #define VERSION "arcnet: cap mode (`c') encapsulation support loaded.\n"
 
-
-static void rx(struct net_device *dev, int bufnum,
-	       struct archdr *pkthdr, int length);
-static int build_header(struct sk_buff *skb,
-			struct net_device *dev,
-			unsigned short type,
-			uint8_t daddr);
-static int prepare_tx(struct net_device *dev, struct archdr *pkt, int length,
-		      int bufnum);
-static int ack_tx(struct net_device *dev, int acked);
-
-
-static struct ArcProto capmode_proto =
-{
-	'r',
-	XMTU,
-	0,
-       	rx,
-	build_header,
-	prepare_tx,
-	NULL,
-	ack_tx
-};
-
-#ifdef MODULE
-
-static void arcnet_cap_init(void)
-{
-	int count;
-
-	for (count = 1; count <= 8; count++)
-		if (arc_proto_map[count] == arc_proto_default)
-			arc_proto_map[count] = &capmode_proto;
-
-	/* for cap mode, we only set the bcast proto if there's no better one */
-	if (arc_bcast_proto == arc_proto_default)
-		arc_bcast_proto = &capmode_proto;
-
-	arc_proto_default = &capmode_proto;
-	arc_raw_proto = &capmode_proto;
-}
-
-static int __init capmode_module_init(void)
-{
-	printk(VERSION);
-	arcnet_cap_init();
-	return 0;
-}
-
-static void __exit capmode_module_exit(void)
-{
-	arcnet_unregister_proto(&capmode_proto);
-}
-module_init(capmode_module_init);
-module_exit(capmode_module_exit);
-
-MODULE_LICENSE("GPL");
-
-#endif				/* MODULE */
-
-
 /* packet receiver */
 static void rx(struct net_device *dev, int bufnum,
 	       struct archdr *pkthdr, int length)
@@ -229,65 +168,108 @@ static int prepare_tx(struct net_device *dev, struct archdr *pkt, int length,
 	BUGMSG(D_DURING, "prepare_tx: length=%d ofs=%d\n",
 	       length,ofs);
 
-	// Copy the arcnet-header + the protocol byte down:
+	/* Copy the arcnet-header + the protocol byte down: */
 	lp->hw.copy_to_card(dev, bufnum, 0, hard, ARC_HDR_SIZE);
 	lp->hw.copy_to_card(dev, bufnum, ofs, &pkt->soft.cap.proto,
 			    sizeof(pkt->soft.cap.proto));
 
-	// Skip the extra integer we have written into it as a cookie
-	// but write the rest of the message:
+	/* Skip the extra integer we have written into it as a cookie
+	   but write the rest of the message: */
 	lp->hw.copy_to_card(dev, bufnum, ofs+1,
 			    ((unsigned char*)&pkt->soft.cap.mes),length-1);
 
 	lp->lastload_dest = hard->dest;
 
-	return 1;		/* done */
+	return 1;	/* done */
 }
 
-
 static int ack_tx(struct net_device *dev, int acked)
 {
-  struct arcnet_local *lp = netdev_priv(dev);
-  struct sk_buff *ackskb;
-  struct archdr *ackpkt;
-  int length=sizeof(struct arc_cap);
+	struct arcnet_local *lp = netdev_priv(dev);
+	struct sk_buff *ackskb;
+	struct archdr *ackpkt;
+	int length=sizeof(struct arc_cap);
+
+	BUGMSG(D_DURING, "capmode: ack_tx: protocol: %x: result: %d\n",
+		lp->outgoing.skb->protocol, acked);
 
-  BUGMSG(D_DURING, "capmode: ack_tx: protocol: %x: result: %d\n",
-	 lp->outgoing.skb->protocol, acked);
+	BUGLVL(D_SKB) arcnet_dump_skb(dev, lp->outgoing.skb, "ack_tx");
 
-  BUGLVL(D_SKB) arcnet_dump_skb(dev, lp->outgoing.skb, "ack_tx");
+	/* Now alloc a skb to send back up through the layers: */
+	ackskb = alloc_skb(length + ARC_HDR_SIZE , GFP_ATOMIC);
+	if (ackskb == NULL) {
+		BUGMSG(D_NORMAL, "Memory squeeze, can't acknowledge.\n");
+		goto free_outskb;
+	}
 
-  /* Now alloc a skb to send back up through the layers: */
-  ackskb = alloc_skb(length + ARC_HDR_SIZE , GFP_ATOMIC);
-  if (ackskb == NULL) {
-	  BUGMSG(D_NORMAL, "Memory squeeze, can't acknowledge.\n");
-	  goto free_outskb;
-  }
+	skb_put(ackskb, length + ARC_HDR_SIZE );
+	ackskb->dev = dev;
 
-  skb_put(ackskb, length + ARC_HDR_SIZE );
-  ackskb->dev = dev;
+	skb_reset_mac_header(ackskb);
+	ackpkt = (struct archdr *)skb_mac_header(ackskb);
+	/* skb_pull(ackskb, ARC_HDR_SIZE); */
 
-  skb_reset_mac_header(ackskb);
-  ackpkt = (struct archdr *)skb_mac_header(ackskb);
-  /* skb_pull(ackskb, ARC_HDR_SIZE); */
+	skb_copy_from_linear_data(lp->outgoing.skb, ackpkt,
+				  ARC_HDR_SIZE + sizeof(struct arc_cap));
+	ackpkt->soft.cap.proto = 0; /* using protocol 0 for acknowledge */
+	ackpkt->soft.cap.mes.ack=acked;
 
+	BUGMSG(D_PROTO, "Ackknowledge for cap packet %x.\n",
+			*((int*)&ackpkt->soft.cap.cookie[0]));
+
+	ackskb->protocol = cpu_to_be16(ETH_P_ARCNET);
+
+	BUGLVL(D_SKB) arcnet_dump_skb(dev, ackskb, "ack_tx_recv");
+	netif_rx(ackskb);
+
+free_outskb:
+	dev_kfree_skb_irq(lp->outgoing.skb);
+	lp->outgoing.proto = NULL; /* We are always finished when in this protocol */
+
+	return 0;
+}
 
-  skb_copy_from_linear_data(lp->outgoing.skb, ackpkt,
-		ARC_HDR_SIZE + sizeof(struct arc_cap));
-  ackpkt->soft.cap.proto=0; /* using protocol 0 for acknowledge */
-  ackpkt->soft.cap.mes.ack=acked;
+static struct ArcProto capmode_proto =
+{
+	'r',
+	XMTU,
+	0,
+	rx,
+	build_header,
+	prepare_tx,
+	NULL,
+	ack_tx
+};
+
+static void arcnet_cap_init(void)
+{
+	int count;
 
-  BUGMSG(D_PROTO, "Ackknowledge for cap packet %x.\n",
-	 *((int*)&ackpkt->soft.cap.cookie[0]));
+	for (count = 1; count <= 8; count++)
+		if (arc_proto_map[count] == arc_proto_default)
+			arc_proto_map[count] = &capmode_proto;
 
-  ackskb->protocol = cpu_to_be16(ETH_P_ARCNET);
+	/* for cap mode, we only set the bcast proto if there's no better one */
+	if (arc_bcast_proto == arc_proto_default)
+		arc_bcast_proto = &capmode_proto;
 
-  BUGLVL(D_SKB) arcnet_dump_skb(dev, ackskb, "ack_tx_recv");
-  netif_rx(ackskb);
+	arc_proto_default = &capmode_proto;
+	arc_raw_proto = &capmode_proto;
+}
 
- free_outskb:
-  dev_kfree_skb_irq(lp->outgoing.skb);
-  lp->outgoing.proto = NULL; /* We are always finished when in this protocol */
+static int __init capmode_module_init(void)
+{
+	printk(VERSION);
+	arcnet_cap_init();
+	return 0;
+}
 
-  return 0;
+static void __exit capmode_module_exit(void)
+{
+	arcnet_unregister_proto(&capmode_proto);
 }
+module_init(capmode_module_init);
+module_exit(capmode_module_exit);
+
+MODULE_LICENSE("GPL");
+
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH iproute] ip: add support for multicast rules
From: Stephen Hemminger @ 2010-05-19 16:03 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Linux Netdev List
In-Reply-To: <4BC48870.7070704@trash.net>

On Tue, 13 Apr 2010 17:06:24 +0200
Patrick McHardy <kaber@trash.net> wrote:

> This patch adds support for a "ip mrule" command, which is used
> to configure multicast routing rules.
> 
> The corresponding kernel patches have been sent to Dave and
> should (hopefully) appear in net-next soon.

The fib_rules.h file in iproute2 is kept in sync with the kernel
headers. But I do not see the definitions of FIB_RULES_IPV4 etc
in net-next kernel.  What happened to this?


^ permalink raw reply

* Re: [PATCHv2] vhost-net: utilize PUBLISH_USED_IDX feature
From: Avi Kivity @ 2010-05-19 16:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, Juan Quintela, Rusty Russell, Paul E. McKenney,
	Arnd Bergmann, kvm, virtualization, netdev, linux-kernel,
	alex.williamson, amit.shah
In-Reply-To: <4BF2D2A7.8030803@redhat.com>

On 05/18/2010 08:47 PM, Avi Kivity wrote:
> On 05/18/2010 05:21 AM, Michael S. Tsirkin wrote:
>> With PUBLISH_USED_IDX, guest tells us which used entries
>> it has consumed. This can be used to reduce the number
>> of interrupts: after we write a used entry, if the guest has not yet
>> consumed the previous entry, or if the guest has already consumed the
>> new entry, we do not need to interrupt.
>> This imporves bandwidth by 30% under some workflows.
>
> Seems to be missing the cacheline alignment.
>
> Rusty's clarification did not satisfy me, I think it's needed.
>

Oh, and this should definitely follow the patch to the virtio spec, not 
precede it.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply

* Re: [PATCH v3] Fix SJA1000 command register writes on SMP systems
From: Oliver Hartkopp @ 2010-05-19 16:23 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: SocketCAN Core Mailing List, Linux Netdev List, David Miller,
	Wolfgang Grandegger
In-Reply-To: <20100518213109.GA29894-OoSGOWW0KRunlFQ6Q1D1Y0B+6BGkLq7r@public.gmane.org>

On 18.05.2010 23:31, Sam Ravnborg wrote:
> Hi Oliver.
> 
>> diff --git a/drivers/net/can/sja1000/sja1000.h b/drivers/net/can/sja1000/sja1000.h
>> index 97a622b..de8e778 100644
>> --- a/drivers/net/can/sja1000/sja1000.h
>> +++ b/drivers/net/can/sja1000/sja1000.h
>> @@ -167,6 +167,7 @@ struct sja1000_priv {
>>  
>>  	void __iomem *reg_base;	 /* ioremap'ed address to registers */
>>  	unsigned long irq_flags; /* for request_irq() */
>> +	spinlock_t cmdreg_lock;  /* lock for concurrent cmd register writes */
>>  
>>  	u16 flags;		/* custom mode flags */
>>  	u8 ocr;			/* output control register */
> 
> You define your spinlock inside a struct so you cannot use
> DEFINE_SPINLOCK().
> 
> But then you need to use spin_lock_init() - which I fail to see
> you are doing in your patch.

Indeed. Sorry.

Will send a patch with spin_lock_init() e.g. to enable the spinlock debugging ...

Regards,
Oliver

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox