Netdev List
 help / color / mirror / Atom feed
* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-08 12:56 UTC (permalink / raw)
  To: Wei Gu; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA66B@ESGSCCMS0001.eapac.ericsson.se>

Le vendredi 08 avril 2011 à 20:19 +0800, Wei Gu a écrit :
> Hi again,
> I tried more testing with by disable this CONFIG_DMAR with shipped
> 2.6.38 ixgbe and Intel released 3.2.10/3.1.15.
> All these test looks we can get >1Mpps 400bype packtes but not stable
> at all, there will huge number missing errors with 100% CPU IDLE:
> ethtool -S eth10 |grep rx_missed_errors
> 
>         rx_missed_errors: 76832040
> 
> SUM: 1102212 ETH8: 0  ETH10: 1102212 ETH6: 0 ETH4: 0
> SUM: 521841 ETH8: 0  ETH10: 521841 ETH6: 0 ETH4: 0
> SUM: 426776 ETH8: 0  ETH10: 426776 ETH6: 0 ETH4: 0
> SUM: 927520 ETH8: 0  ETH10: 927520 ETH6: 0 ETH4: 0
> SUM: 1171995 ETH8: 0  ETH10: 1171995 ETH6: 0 ETH4: 0
> SUM: 855980 ETH8: 0  ETH10: 855980 ETH6: 0 ETH4: 0
> 
> 
> Do you know if there is other options in the kernel will cause high
> rate rx_missed_errors with low CPU usage. (No problem on 2.6.32 with
> same test case)
> 
> perf  record:
> +     69.74%          swapper  [kernel.kallsyms]          [k] poll_idle
> +     11.62%          swapper  [kernel.kallsyms]          [k] intel_idle
> +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> +      0.77%             perf  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.64%          swapper  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.48%             perf  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.44%          swapper  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.36%          swapper  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> +      0.35%          swapper  [kernel.kallsyms]          [k] kfree
> +      0.35%             perf  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> 


Make sure enough cpus serves interrupts, _before_ even starting your
stress test.

Then, make sure trafic is distributed to many different queues.
If a single flow is used, it probably uses a single queue ->single CPU.

Say you have irq affinities set to fffffffffffff  (all cpus able to
serve IRQ X,Y,Z,T,...)

Then you have a network burst (because you start your packet generator
at full rate), spreaded on many queues.

CPU0 takes hard interrupt for queue 0, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 0, eth10, and queues NAPI mode.
CPU0 takes hard interrupt for queue 1, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 1, eth10, and queues NAPI mode.
CPU0 takes hard interrupt for queue 2, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 2, eth10, and queues NAPI mode.
...
CPU0 takes hard interrupt for queue X, eth8, and queues NAPI mode.
...

Then softirq can start, and only CPU0 is able to handle NAPI for all the
queued devices. You are stuck, with CPU0 never leaving ksoftirqd.

NAPI handling is always performed on the CPU that received the hardware
interrupt, until we exit NAPI (and rearm interrupt delivery).
It cannot migrate to an "idle cpu"



^ permalink raw reply

* ARM, AF_PACKET: caching problems on Marvell Kirkwood
From: Phil Sutter @ 2011-04-08 13:06 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: netdev, ne

Dear lists,

I am experiencing severe caching issues using the TX_RING feature of
AF_PACKET on a Kirkwood-based system (i.e., OpenRD). This may likely be
a bug of the CPU/SoC itself, at least it reacts a bit picky when using
the preload data instruction (pld) in rather useless cases (but that's a
different story).

There is simple testing code at the end of this email, effectively just
preparing a packet in the TX_RING and triggering it's delivery once per
second. The experienced symptom is that sporadically nothing goes out in
one iteration, and two packets in the following one.

It looks like the kernel doesn't get the changed value of tp_status in
time, although userspace sees the correct value. Note that moving the
sleep(1) from the end of the loop to just before calling sendto() fixes
the problem.

Another (more useful) workaround is to call flush_cache_all() at the
beginning of packet_sendmsg() in net/packet/af_packet.c. I was not able
to fix this with some more specific flushing at that place. Anyway, the
call to flush_dcache_page() from __packet_get_status() in the same
source file is meant to do the trick I guess. But somehow doesn't.

Feedback regardles of which kind is highly appreciated, of course!

Greetings, Phil

------------------[start of packet_mmap_test.c]--------------------
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <net/if.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/socket.h>
#include <sys/types.h>

#define PERROR_EXIT(rc, mesg) { \
	perror(mesg); \
	return rc; \
}

int main(void)
{
	uint32_t size;
	struct sockaddr_ll sa;
	struct ifreq ifr;
	int index;
	int tmp;
	int fd;
	struct tpacket_req packet_req;
	struct tpacket2_hdr * ps_header_start, *ps_header;

	if ((fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL))) < 0)
		PERROR_EXIT(EXIT_FAILURE, "socket");

	/* retrieve eth0's interface index number */
	strncpy (ifr.ifr_name, "eth0", sizeof(ifr.ifr_name));
	if (ioctl(fd, SIOCGIFINDEX, &ifr) < 0)
		PERROR_EXIT(EXIT_FAILURE, "ioctl(SIOCGIFINDEX)");

	/* set sockaddr info */
	memset(&sa, 0, sizeof(sa));
	sa.sll_family = AF_PACKET;
	sa.sll_protocol = ETH_P_ALL;
	sa.sll_ifindex = ifr.ifr_ifindex;

	/* bind port */
	if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "bind()");

	tmp = TPACKET_V2;
	if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, &tmp, sizeof(tmp)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_VERSION)");

	/* set packet loss option */
	tmp = 1;
	if (setsockopt(fd, SOL_PACKET, PACKET_LOSS, &tmp, sizeof(tmp)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "setsockopt(PACKET_LOSS)");

	/* prepare Tx ring request */
	packet_req.tp_block_size = 1024 * 8;
	packet_req.tp_frame_size = 1024 * 8;
	packet_req.tp_block_nr = 1024;
	packet_req.tp_frame_nr = 1024;

	/* send TX ring request */
	if (setsockopt(fd, SOL_PACKET, PACKET_TX_RING,
	               &packet_req, sizeof(packet_req)) < 0)
		PERROR_EXIT(EXIT_FAILURE, "setsockopt: PACKET_TX_RING");

	/* calculate memory to mmap in the kernel */
	size = packet_req.tp_block_size * packet_req.tp_block_nr;

	/* mmap Tx ring buffers memory */
	ps_header_start = mmap(0, size,
			PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (ps_header_start < 0)
		PERROR_EXIT(EXIT_FAILURE, "mmap");

	/* fill peer sockaddr for SOCK_DGRAM */
	sa.sll_family = AF_PACKET;
	sa.sll_protocol = htons(ETH_P_IP);
	sa.sll_ifindex = ifr.ifr_ifindex;
	sa.sll_halen = ETH_ALEN;
	memset(&sa.sll_addr, 0xff, ETH_ALEN);

	ps_header = ps_header_start;
	while (1) {
		int sendlen, j;

		char *data = (void*)ps_header + TPACKET_HDRLEN
		              - sizeof(struct sockaddr_ll);

		switch((volatile uint32_t)ps_header->tp_status)
		{
		case TP_STATUS_AVAILABLE:
			memset(data, 0x23, 150);
			break;

		case TP_STATUS_WRONG_FORMAT:
			printf("An error has occured during transfer\n");
			exit(EXIT_FAILURE);
			break;

		default:
			printf("Buffer is not available, aborting\n");
			exit(1);
			break;
		}
		ps_header->tp_len = 150;
		ps_header->tp_status = TP_STATUS_SEND_REQUEST;

		sendlen = sendto(fd, NULL, 0, 0,
				(struct sockaddr *)&sa, sizeof(sa));
		if (sendlen < 0)
			perror("sendto");
		else if (sendlen == 0)
			printf("sendto(): nothing sent!\n");
		else
			printf("sendto(): sent %d bytes out\n", sendlen);

#define ST_IS(x) ((volatile uint32_t)ps_header->tp_status == x)
		printf("tp_status after sending: %s\n",
				ST_IS(TP_STATUS_AVAILABLE) ? "AVAILABLE" :
				ST_IS(TP_STATUS_SEND_REQUEST) ? "SEND_REQUEST" :
				ST_IS(TP_STATUS_WRONG_FORMAT) ? "WRONG_FORMAT" :
				"unknown");
#undef ST_IS

		ps_header = (void *)ps_header + packet_req.tp_frame_size;
		if (ps_header >= ps_header_start + size)
			ps_header = ps_header_start;

		sleep(1);
	}
	return 0;
}
--------------------[end of packet_mmap_test.c]--------------------

^ permalink raw reply

* [PATCH] igb: restore EEPROM 16kB access limit
From: Stefan Assmann @ 2011-04-08 13:34 UTC (permalink / raw)
  To: netdev; +Cc: e1000-devel, john.ronciak

The check limiting the EEPROM access up to 16kB was removed by
commit 4322e561a93ec7ee034b603a6c610e7be90d4e8a. Without this check
the kernel will try to checksum the EEPROM up to 2MB (observed with
a 8086:10c9 NIC) and fail.

igb 0000:03:00.0: 0 vfs allocated
igb 0000:03:00.0: The NVM Checksum Is Not Valid
ACPI: PCI interrupt for device 0000:03:00.0 disabled
igb: probe of 0000:03:00.0 failed with error -5

Reason for that being an overflow in u16 e1000_nvm_info->nvm
while doing "nvm->word_size = 1 << size;" with size == 21.
Putting the check back in place.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
---
 drivers/net/igb/e1000_82575.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 6b256c2..5cfa37f 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -244,6 +244,10 @@ static s32 igb_get_invariants_82575(struct e1000_hw *hw)
 	 */
 	size += NVM_WORD_SIZE_BASE_SHIFT;
 
+	/* EEPROM access above 16k is unsupported */
+	if (size > 14)
+		size = 14;
+
 	nvm->word_size = 1 << size;
 	if (nvm->word_size == (1 << 15))
 		nvm->page_size = 128;
-- 
1.7.4


------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* [PATCH] bonding:set save_load to 0 when initializing
From: Weiping Pan(潘卫平) @ 2011-04-08 13:40 UTC (permalink / raw)
  To: fubar, andy; +Cc: netdev, linux-kernel, Weiping Pan 

It is unnecessary to set save_load to 1 here,
as the tx_hashtbl is just kzalloced.

Signed-off-by: Weiping Pan(潘卫平) <panweiping3@gmail.com>
---
 drivers/net/bonding/bond_alb.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 9bc5de3..ab69e5a 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -176,7 +176,7 @@ static int tlb_initialize(struct bonding *bond)
 	bond_info->tx_hashtbl = new_hashtbl;
 
 	for (i = 0; i < TLB_HASH_TABLE_SIZE; i++) {
-		tlb_init_table_entry(&bond_info->tx_hashtbl[i], 1);
+		tlb_init_table_entry(&bond_info->tx_hashtbl[i], 0);
 	}
 
 	_unlock_tx_hashtbl(bond);
-- 
1.7.4

^ permalink raw reply related

* Re: Kernel panic when using bridge
From: Sebastian Nickel @ 2011-04-08 13:49 UTC (permalink / raw)
  To: netdev
In-Reply-To: <4D9E62D9.5010400@scotdoyle.com>

Scot Doyle <lkml <at> scotdoyle.com> writes:

> 
> This kernel panic occurs when using a bridge. I would be grateful for 
> any ideas on how to correct it.
> 
> The panic was captured on two servers after three or four days of 
> minimal use, both configured as follows:
> - unpatched kernel 2.6.39-rc1 (commit 
> ecb78ab6f30106ab72a575a25b1cdfd1633b7ca2) with default .config options
> - br0 on single intel igb NIC (3 other NIC's unused)
> - br0 with ip address on distinct /27 subnet
> - br0:1 with ip address on distinct /24 subnet
> - br0:2 with ip address on distinct /24 subnet
> - no iptables rules
> - ebtables not installed
> 
> "net/bridge/br_netfilter.c" and "net/ipv4/ip_options.c" (in the current 
> 2.6.39-rc2 and in net-next-2.6) are identical to the versions used to 
> build this kernel.
> 

We have the same problems with kernel version 2.6.37.4 (unpatched). Almost same
stacktrace.

After some time there is a kernel panic.

-br0 contains a realtek NIC (8169) and some vnet devices used with KVM.

Any ideas would be great...


^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-08 14:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <1302267400.4409.22.camel@edumazet-laptop>

Hi,
Got you mean.
But as I decribed before, I start the eth10 with 8 rx queues and 8 tx queues, and then I binding these 8 tx&rx queue each to CPU core 24-32 (NUMA3), which I think could gain the best performance in my case (It's true on Linux 2.6.32)
single queue ->single CPU
Then I can descibe a little bit with packet generator, I config the IXIA to continues increase the dest ip address towards the test server, so the packet was evenly distributed to each receving queues of the eth10. And according the IXIA tools the transmit sharp was really good, no too much peaks

What I observed on Linux 2.6.38 during the test, there is no softqd was stressed (< 03% on SI for each core(24-31)) while the packet lost happens, so we are not really stress the CPU:), It looks like we are limited  on some memory bandwidth (DMA) on this release

And with same test case on 2.6.32, no such problem at all. It running pretty stable > 2Mpps without rx_missing_error. There is no HW limitation on this DL580


BTW what is these "swapper"
+      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
+      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
Why the ixgbe_poll was on swapper/perf?

Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Friday, April 08, 2011 8:57 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le vendredi 08 avril 2011 à 20:19 +0800, Wei Gu a écrit :
> Hi again,
> I tried more testing with by disable this CONFIG_DMAR with shipped
> 2.6.38 ixgbe and Intel released 3.2.10/3.1.15.
> All these test looks we can get >1Mpps 400bype packtes but not stable
> at all, there will huge number missing errors with 100% CPU IDLE:
> ethtool -S eth10 |grep rx_missed_errors
>
>         rx_missed_errors: 76832040
>
> SUM: 1102212 ETH8: 0  ETH10: 1102212 ETH6: 0 ETH4: 0
> SUM: 521841 ETH8: 0  ETH10: 521841 ETH6: 0 ETH4: 0
> SUM: 426776 ETH8: 0  ETH10: 426776 ETH6: 0 ETH4: 0
> SUM: 927520 ETH8: 0  ETH10: 927520 ETH6: 0 ETH4: 0
> SUM: 1171995 ETH8: 0  ETH10: 1171995 ETH6: 0 ETH4: 0
> SUM: 855980 ETH8: 0  ETH10: 855980 ETH6: 0 ETH4: 0
>
>
> Do you know if there is other options in the kernel will cause high
> rate rx_missed_errors with low CPU usage. (No problem on 2.6.32 with
> same test case)
>
> perf  record:
> +     69.74%          swapper  [kernel.kallsyms]          [k] poll_idle
> +     11.62%          swapper  [kernel.kallsyms]          [k] intel_idle
> +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> +      0.77%             perf  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.64%          swapper  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.48%             perf  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.44%          swapper  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.36%          swapper  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> +      0.35%          swapper  [kernel.kallsyms]          [k] kfree
> +      0.35%             perf  [kernel.kallsyms]          [k] kmem_cache_alloc_node
>


Make sure enough cpus serves interrupts, _before_ even starting your stress test.

Then, make sure trafic is distributed to many different queues.
If a single flow is used, it probably uses a single queue ->single CPU.

Say you have irq affinities set to fffffffffffff  (all cpus able to serve IRQ X,Y,Z,T,...)

Then you have a network burst (because you start your packet generator at full rate), spreaded on many queues.

CPU0 takes hard interrupt for queue 0, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 0, eth10, and queues NAPI mode.
CPU0 takes hard interrupt for queue 1, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 1, eth10, and queues NAPI mode.
CPU0 takes hard interrupt for queue 2, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 2, eth10, and queues NAPI mode.
...
CPU0 takes hard interrupt for queue X, eth8, and queues NAPI mode.
...

Then softirq can start, and only CPU0 is able to handle NAPI for all the queued devices. You are stuck, with CPU0 never leaving ksoftirqd.

NAPI handling is always performed on the CPU that received the hardware interrupt, until we exit NAPI (and rearm interrupt delivery).
It cannot migrate to an "idle cpu"



^ permalink raw reply

* Re: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Stephen Hemminger @ 2011-04-08 14:49 UTC (permalink / raw)
  To: Wei Gu; +Cc: Eric Dumazet, Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA682@ESGSCCMS0001.eapac.ericsson.se>

On Fri, 8 Apr 2011 22:10:50 +0800
Wei Gu <wei.gu@ericsson.com> wrote:

> Hi,
> Got you mean.
> But as I decribed before, I start the eth10 with 8 rx queues and 8 tx queues, and then I binding these 8 tx&rx queue each to CPU core 24-32 (NUMA3), which I think could gain the best performance in my case (It's true on Linux 2.6.32)
> single queue ->single CPU
> Then I can descibe a little bit with packet generator, I config the IXIA to continues increase the dest ip address towards the test server, so the packet was evenly distributed to each receving queues of the eth10. And according the IXIA tools the transmit sharp was really good, no too much peaks
> 
> What I observed on Linux 2.6.38 during the test, there is no softqd was stressed (< 03% on SI for each core(24-31)) while the packet lost happens, so we are not really stress the CPU:), It looks like we are limited  on some memory bandwidth (DMA) on this release
> 
> And with same test case on 2.6.32, no such problem at all. It running pretty stable > 2Mpps without rx_missing_error. There is no HW limitation on this DL580
> 
> 
> BTW what is these "swapper"
> +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> Why the ixgbe_poll was on swapper/perf?
> 
> Thanks
> WeiGu
> 
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, April 08, 2011 8:57 PM
> To: Wei Gu
> Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
> Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
> 
> Le vendredi 08 avril 2011 à 20:19 +0800, Wei Gu a écrit :
> > Hi again,
> > I tried more testing with by disable this CONFIG_DMAR with shipped
> > 2.6.38 ixgbe and Intel released 3.2.10/3.1.15.
> > All these test looks we can get >1Mpps 400bype packtes but not stable
> > at all, there will huge number missing errors with 100% CPU IDLE:
> > ethtool -S eth10 |grep rx_missed_errors
> >
> >         rx_missed_errors: 76832040
> >
> > SUM: 1102212 ETH8: 0  ETH10: 1102212 ETH6: 0 ETH4: 0
> > SUM: 521841 ETH8: 0  ETH10: 521841 ETH6: 0 ETH4: 0
> > SUM: 426776 ETH8: 0  ETH10: 426776 ETH6: 0 ETH4: 0
> > SUM: 927520 ETH8: 0  ETH10: 927520 ETH6: 0 ETH4: 0
> > SUM: 1171995 ETH8: 0  ETH10: 1171995 ETH6: 0 ETH4: 0
> > SUM: 855980 ETH8: 0  ETH10: 855980 ETH6: 0 ETH4: 0
> >
> >
> > Do you know if there is other options in the kernel will cause high
> > rate rx_missed_errors with low CPU usage. (No problem on 2.6.32 with
> > same test case)
> >
> > perf  record:
> > +     69.74%          swapper  [kernel.kallsyms]          [k] poll_idle
> > +     11.62%          swapper  [kernel.kallsyms]          [k] intel_idle
> > +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> > +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> > +      0.77%             perf  [kernel.kallsyms]          [k] skb_copy_bits
> > +      0.64%          swapper  [kernel.kallsyms]          [k] skb_copy_bits
> > +      0.48%             perf  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> > +      0.44%          swapper  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> > +      0.36%          swapper  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> > +      0.35%          swapper  [kernel.kallsyms]          [k] kfree
> > +      0.35%             perf  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> >
> 
> 
> Make sure enough cpus serves interrupts, _before_ even starting your stress test.
> 
> Then, make sure trafic is distributed to many different queues.
> If a single flow is used, it probably uses a single queue ->single CPU.
> 
> Say you have irq affinities set to fffffffffffff  (all cpus able to serve IRQ X,Y,Z,T,...)
> 
> Then you have a network burst (because you start your packet generator at full rate), spreaded on many queues.
> 
> CPU0 takes hard interrupt for queue 0, eth8, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 0, eth10, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 1, eth8, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 1, eth10, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 2, eth8, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 2, eth10, and queues NAPI mode.
> ...
> CPU0 takes hard interrupt for queue X, eth8, and queues NAPI mode.
> ...
> 
> Then softirq can start, and only CPU0 is able to handle NAPI for all the queued devices. You are stuck, with CPU0 never leaving ksoftirqd.
> 
> NAPI handling is always performed on the CPU that received the hardware interrupt, until we exit NAPI (and rearm interrupt delivery).
> It cannot migrate to an "idle cpu"

For performance, you need to assign each network interrupt to a single
CPU. There is no load balancing effect in the IRQ controller.

If you have a multi-socket system, then it is a good idea to make the IRQ's
for the NIC's be on the same socket as the bus interface. Multi socket systems
are really NUMA and putting IRQ on non-local CPU has measurable impact.



-- 

^ permalink raw reply

* Re: Kernel panic when using bridge
From: Scot Doyle @ 2011-04-08 14:57 UTC (permalink / raw)
  To: Sebastian Nickel; +Cc: netdev, Pallai Roland
In-Reply-To: <loom.20110408T153726-822@post.gmane.org>

On 04/08/2011 08:49 AM, Sebastian Nickel wrote:
> We have the same problems with kernel version 2.6.37.4 (unpatched). Almost same
> stacktrace.
>
> After some time there is a kernel panic.
>
> -br0 contains a realtek NIC (8169) and some vnet devices used with KVM.
>
> Any ideas would be great...

Perhaps the problem is isolated to the bridging code? Neither KVM guest 
nor associated tap device were running during my second reported panic.

Here's a similar stacktrace from a third person:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620201

^ permalink raw reply

* Re: [RFC 1/6]Fix typo "recieve" in various parts of the kernel.
From: Justin P. Mattock @ 2011-04-08 15:06 UTC (permalink / raw)
  To: Ricard Wanderlof
  Cc: Uwe Kleine-König, trivial@kernel.org,
	linux-scsi@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	socketcan-core@lists.berlios.de, linux-mtd@lists.infradead.org,
	linux-input@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <Pine.LNX.4.64.1104080909040.5186@lnxricardw.se.axis.com>

On 04/08/2011 12:14 AM, Ricard Wanderlof wrote:
>
> On Fri, 8 Apr 2011, Justin P. Mattock wrote:
>
>> On 04/07/2011 01:05 PM, Uwe Kleine-König wrote:
>>> On Thu, Apr 07, 2011 at 09:09:22AM -0700, Justin P. Mattock wrote:
>>>> The patch below fixes some typos "recieve" in various parts of the
>>>> kernel.
>>>> Note: these below are in actual code rather than comments(excpet for
>>>> r852.c which
>>> s/excpet/except/
>>>
>>>> has a code fix, and comment).
>>>> compile tested as best as I can...
>>> I'm not a native speaker, but "as good as" sounds better in my ears.
>>>
>>> Best regards
>>> Uwe
>>>
>>
>> your right.. gcc and friends are the one's doing all the reall work:
>> "as good as *"
>
> I think Uwe was thinking of grammar rather than who was doing the actual
> work.
>
> Now that I've gone more or less off topic I may as well voice my opinion
> the 'as best as I can' sounds like a mixup of two constructs: either 'as
> best I can' (which is probably slightly unusual these days), or 'as well
> as I can'. 'I've done it as good as I can' is gramatically wrong (since
> 'good' is an adjective and 'well' is an adverb, and it is referring to
> 'how I did it' (or in the case above, 'how well I tested it'), but
> fairly common nevertheless, and given the large percentage of non-native
> speakers in the Linux community I wouldn't worry about it, the meaning
> is clear anyway.
>
> I'll shut up and go now. :-)
>
> /Ricard


well... pretty good explanation.

Justin P. Mattock

^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-08 15:07 UTC (permalink / raw)
  To: Wei Gu; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA682@ESGSCCMS0001.eapac.ericsson.se>

Le vendredi 08 avril 2011 à 22:10 +0800, Wei Gu a écrit :
> Hi,
> Got you mean.
> But as I decribed before, I start the eth10 with 8 rx queues and 8 tx
> queues, and then I binding these 8 tx&rx queue each to CPU core 24-32
> (NUMA3), which I think could gain the best performance in my case
> (It's true on Linux 2.6.32)
> single queue ->single CPU

Try with other cpus ? Maybe a mix.

Maybe your thinking is not good, and you chose the cpus that were not
the best candidates. This was OK in 2.6.32 because you were lucky.

Using cpus from an unique NUMA node is not very good, since only one
NUMA node is going to be used, and other NUMA nodes are idle.


NUMA binding is tricky. Linux try to use local node, hoping that all
cpus are running and use local memory. In the end, global throughput is
better.

But if your workload use cpus from one single node, then it means you
lose part of the memory bandwidth.


> Then I can descibe a little bit with packet generator, I config the
> IXIA to continues increase the dest ip address towards the test
> server, so the packet was evenly distributed to each receving queues
> of the eth10. And according the IXIA tools the transmit sharp was
> really good, no too much peaks
> 
> What I observed on Linux 2.6.38 during the test, there is no softqd
> was stressed (< 03% on SI for each core(24-31)) while the packet lost
> happens, so we are not really stress the CPU:), It looks like we are
> limited  on some memory bandwidth (DMA) on this release

That would mean you chose the wrong cpus to handle this load.


> 
> And with same test case on 2.6.32, no such problem at all. It running
> pretty stable > 2Mpps without rx_missing_error. There is no HW
> limitation on this DL580
> 
> 
> BTW what is these "swapper"
> +      0.80%          swapper  [ixgbe]                    [k]
> ixgbe_poll
> +      0.79%             perf  [ixgbe]                    [k]
> ixgbe_poll
> Why the ixgbe_poll was on swapper/perf?
> 

softirq are run behalf the current interrupted thread, unless you enter
ksoftirqd if load is high.

It can be "idle task" or the "perf" task, or another ones...




^ permalink raw reply

* Re: [PATCH] net: r8169: convert to hw_features
From: David Dillow @ 2011-04-08 15:37 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Francois Romieu
In-Reply-To: <20110408124406.GA22478@rere.qmqm.pl>

On Fri, 2011-04-08 at 14:44 +0200, Michał Mirosław wrote:
> On Fri, Apr 08, 2011 at 02:38:46PM +0200, Michał Mirosław wrote:
> > This enables SG+IP_CSUM+TSO by default (there were no comments suggesting
> > leaving them out was intentional).
> > 
> > This also fixes confusion around vlan_features in rtl8169_vlan_mode().
> 
> BTW, I noticed that TSO will break for MTU > 4095+(TCP+IP header len).
> This needs handing in ndo_fix_features callback like other MTU-limited TSO
> engines.

I'd suggest leaving SG/CSUM/TSO off by default -- I played with getting
them working some time ago, and IIRC the current code doesn't handle all
devices properly. Add the issue you note above and you are knowingly
leaving landmines laying about for users of a popular piece of hardware.

Realtek, Francois, please correct me if I'm mistaken.

Dave


^ permalink raw reply

* Re: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Alexander Duyck @ 2011-04-08 16:22 UTC (permalink / raw)
  To: Wei Gu; +Cc: Eric Dumazet, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA66B@ESGSCCMS0001.eapac.ericsson.se>

On 4/8/2011 5:19 AM, Wei Gu wrote:
> Hi again,
> I tried more testing with by disable this CONFIG_DMAR with shipped 2.6.38 ixgbe and Intel released 3.2.10/3.1.15.
> All these test looks we can get>1Mpps 400bype packtes but not stable at all, there will huge number missing errors with 100% CPU IDLE:
> ethtool -S eth10 |grep rx_missed_errors
>
>          rx_missed_errors: 76832040
>
> SUM: 1102212 ETH8: 0  ETH10: 1102212 ETH6: 0 ETH4: 0
> SUM: 521841 ETH8: 0  ETH10: 521841 ETH6: 0 ETH4: 0
> SUM: 426776 ETH8: 0  ETH10: 426776 ETH6: 0 ETH4: 0
> SUM: 927520 ETH8: 0  ETH10: 927520 ETH6: 0 ETH4: 0
> SUM: 1171995 ETH8: 0  ETH10: 1171995 ETH6: 0 ETH4: 0
> SUM: 855980 ETH8: 0  ETH10: 855980 ETH6: 0 ETH4: 0
>
>
> Do you know if there is other options in the kernel will cause high rate rx_missed_errors with low CPU usage. (No problem on 2.6.32 with same test case)
>
> perf  record:
> +     69.74%          swapper  [kernel.kallsyms]          [k] poll_idle
> +     11.62%          swapper  [kernel.kallsyms]          [k] intel_idle
> +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> +      0.77%             perf  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.64%          swapper  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.48%             perf  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.44%          swapper  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.36%          swapper  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> +      0.35%          swapper  [kernel.kallsyms]          [k] kfree
> +      0.35%             perf  [kernel.kallsyms]          [k] kmem_cache_alloc_node

I was wondering if you could dump all of your ethtool stats instead of 
just the rx_missed_errors as this will provide us with much more info to 
work with.

I'm mainly interested in seeing if the rx_no_buffer_count is 
incrementing as well.  If it is not then what you may be seeing is a bus 
bandwidth issue depending on what slot you are in.

Also if you could provide an lspci dump for the part that would also 
give us some additional information on your PCIe bus configuration.

Thanks,

Alex

^ permalink raw reply

* [PATCH v2] net: r8169: convert to hw_features
From: Michał Mirosław @ 2011-04-08 16:35 UTC (permalink / raw)
  To: netdev; +Cc: Francois Romieu, David Dillow
In-Reply-To: <1302277067.19764.2.camel@obelisk.thedillows.org>

Simple conversion with a bit of needed cleanup.

This also fixes:
 - confusion around vlan_features in rtl8169_vlan_mode(),
 - problem with broken TSO for too big MTU (the limit is set
   at 0xFFF --- max MSS field value).

SG+IP_CSUM+TSO is left disabled by default, based on suggestion by
David Dillow.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/r8169.c |   95 ++++++++++++++++++---------------------------------
 1 files changed, 33 insertions(+), 62 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index caa99cd..058524f 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1286,14 +1286,15 @@ static int rtl8169_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 	return ret;
 }
 
-static u32 rtl8169_get_rx_csum(struct net_device *dev)
+static u32 rtl8169_fix_features(struct net_device *dev, u32 features)
 {
-	struct rtl8169_private *tp = netdev_priv(dev);
+	if (dev->mtu > MSSMask)
+		features &= ~NETIF_F_ALL_TSO;
 
-	return tp->cp_cmd & RxChkSum;
+	return features;
 }
 
-static int rtl8169_set_rx_csum(struct net_device *dev, u32 data)
+static int rtl8169_set_features(struct net_device *dev, u32 features)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -1301,11 +1302,16 @@ static int rtl8169_set_rx_csum(struct net_device *dev, u32 data)
 
 	spin_lock_irqsave(&tp->lock, flags);
 
-	if (data)
+	if (features & NETIF_F_RXCSUM)
 		tp->cp_cmd |= RxChkSum;
 	else
 		tp->cp_cmd &= ~RxChkSum;
 
+	if (dev->features & NETIF_F_HW_VLAN_RX)
+		tp->cp_cmd |= RxVlan;
+	else
+		tp->cp_cmd &= ~RxVlan;
+
 	RTL_W16(CPlusCmd, tp->cp_cmd);
 	RTL_R16(CPlusCmd);
 
@@ -1321,27 +1327,6 @@ static inline u32 rtl8169_tx_vlan_tag(struct rtl8169_private *tp,
 		TxVlanTag | swab16(vlan_tx_tag_get(skb)) : 0x00;
 }
 
-#define NETIF_F_HW_VLAN_TX_RX	(NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX)
-
-static void rtl8169_vlan_mode(struct net_device *dev)
-{
-	struct rtl8169_private *tp = netdev_priv(dev);
-	void __iomem *ioaddr = tp->mmio_addr;
-	unsigned long flags;
-
-	spin_lock_irqsave(&tp->lock, flags);
-	if (dev->features & NETIF_F_HW_VLAN_RX)
-		tp->cp_cmd |= RxVlan;
-	else
-		tp->cp_cmd &= ~RxVlan;
-	RTL_W16(CPlusCmd, tp->cp_cmd);
-	/* PCI commit */
-	RTL_R16(CPlusCmd);
-	spin_unlock_irqrestore(&tp->lock, flags);
-
-	dev->vlan_features = dev->features &~ NETIF_F_HW_VLAN_TX_RX;
-}
-
 static void rtl8169_rx_vlan_tag(struct RxDesc *desc, struct sk_buff *skb)
 {
 	u32 opts2 = le32_to_cpu(desc->opts2);
@@ -1522,28 +1507,6 @@ static void rtl8169_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 	}
 }
 
-static int rtl8169_set_flags(struct net_device *dev, u32 data)
-{
-	struct rtl8169_private *tp = netdev_priv(dev);
-	unsigned long old_feat = dev->features;
-	int rc;
-
-	if ((tp->mac_version == RTL_GIGA_MAC_VER_05) &&
-	    !(data & ETH_FLAG_RXVLAN)) {
-		netif_info(tp, drv, dev, "8110SCd requires hardware Rx VLAN\n");
-		return -EINVAL;
-	}
-
-	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_TXVLAN | ETH_FLAG_RXVLAN);
-	if (rc)
-		return rc;
-
-	if ((old_feat ^ dev->features) & NETIF_F_HW_VLAN_RX)
-		rtl8169_vlan_mode(dev);
-
-	return 0;
-}
-
 static const struct ethtool_ops rtl8169_ethtool_ops = {
 	.get_drvinfo		= rtl8169_get_drvinfo,
 	.get_regs_len		= rtl8169_get_regs_len,
@@ -1552,19 +1515,12 @@ static const struct ethtool_ops rtl8169_ethtool_ops = {
 	.set_settings		= rtl8169_set_settings,
 	.get_msglevel		= rtl8169_get_msglevel,
 	.set_msglevel		= rtl8169_set_msglevel,
-	.get_rx_csum		= rtl8169_get_rx_csum,
-	.set_rx_csum		= rtl8169_set_rx_csum,
-	.set_tx_csum		= ethtool_op_set_tx_csum,
-	.set_sg			= ethtool_op_set_sg,
-	.set_tso		= ethtool_op_set_tso,
 	.get_regs		= rtl8169_get_regs,
 	.get_wol		= rtl8169_get_wol,
 	.set_wol		= rtl8169_set_wol,
 	.get_strings		= rtl8169_get_strings,
 	.get_sset_count		= rtl8169_get_sset_count,
 	.get_ethtool_stats	= rtl8169_get_ethtool_stats,
-	.set_flags		= rtl8169_set_flags,
-	.get_flags		= ethtool_op_get_flags,
 };
 
 static void rtl8169_get_mac_version(struct rtl8169_private *tp,
@@ -2979,6 +2935,8 @@ static const struct net_device_ops rtl8169_netdev_ops = {
 	.ndo_tx_timeout		= rtl8169_tx_timeout,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_change_mtu		= rtl8169_change_mtu,
+	.ndo_fix_features	= rtl8169_fix_features,
+	.ndo_set_features	= rtl8169_set_features,
 	.ndo_set_mac_address	= rtl_set_mac_address,
 	.ndo_do_ioctl		= rtl8169_ioctl,
 	.ndo_set_multicast_list	= rtl_set_rx_mode,
@@ -3425,7 +3383,19 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netif_napi_add(dev, &tp->napi, rtl8169_poll, R8169_NAPI_WEIGHT);
 
-	dev->features |= NETIF_F_HW_VLAN_TX_RX | NETIF_F_GRO;
+	/* don't enable SG, IP_CSUM and TSO by default - it might not work
+	 * properly for all devices */
+	dev->features |= NETIF_F_RXCSUM |
+		NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+
+	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
+		NETIF_F_RXCSUM | NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+	dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
+		NETIF_F_HIGHDMA;
+
+	if (tp->mac_version == RTL_GIGA_MAC_VER_05)
+		/* 8110SCd requires hardware Rx VLAN - disallow toggling */
+		dev->hw_features &= ~NETIF_F_HW_VLAN_RX;
 
 	tp->intr_mask = 0xffff;
 	tp->hw_start = cfg->hw_start;
@@ -3545,7 +3515,7 @@ static int rtl8169_open(struct net_device *dev)
 
 	rtl8169_init_phy(dev, tp);
 
-	rtl8169_vlan_mode(dev);
+	rtl8169_set_features(dev, dev->features);
 
 	rtl_pll_power_up(tp);
 
@@ -4318,6 +4288,8 @@ static int rtl8169_change_mtu(struct net_device *dev, int new_mtu)
 		return -EINVAL;
 
 	dev->mtu = new_mtu;
+	netdev_update_features(dev);
+
 	return 0;
 }
 
@@ -4642,12 +4614,11 @@ err_out:
 
 static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev)
 {
-	if (dev->features & NETIF_F_TSO) {
-		u32 mss = skb_shinfo(skb)->gso_size;
+	u32 mss = skb_shinfo(skb)->gso_size;
+
+	if (mss)
+		return LargeSend | ((mss & MSSMask) << MSSShift);
 
-		if (mss)
-			return LargeSend | ((mss & MSSMask) << MSSShift);
-	}
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		const struct iphdr *ip = ip_hdr(skb);
 
-- 
1.7.2.5


^ permalink raw reply related

* RE: [PATCH] igb: restore EEPROM 16kB access limit
From: Wyborny, Carolyn @ 2011-04-08 16:40 UTC (permalink / raw)
  To: Stefan Assmann, netdev@vger.kernel.org
  Cc: e1000-devel@lists.sourceforge.net, Kirsher, Jeffrey T,
	Pieper, Jeffrey E, Ronciak, John
In-Reply-To: <1302269695-27188-1-git-send-email-sassmann@kpanic.de>



>-----Original Message-----
>From: Stefan Assmann [mailto:sassmann@kpanic.de]
>Sent: Friday, April 08, 2011 6:35 AM
>To: netdev@vger.kernel.org
>Cc: e1000-devel@lists.sourceforge.net; Kirsher, Jeffrey T; Pieper,
>Jeffrey E; Wyborny, Carolyn; Ronciak, John
>Subject: [PATCH] igb: restore EEPROM 16kB access limit
>
>The check limiting the EEPROM access up to 16kB was removed by
>commit 4322e561a93ec7ee034b603a6c610e7be90d4e8a. Without this check
>the kernel will try to checksum the EEPROM up to 2MB (observed with
>a 8086:10c9 NIC) and fail.
>
>igb 0000:03:00.0: 0 vfs allocated
>igb 0000:03:00.0: The NVM Checksum Is Not Valid
>ACPI: PCI interrupt for device 0000:03:00.0 disabled
>igb: probe of 0000:03:00.0 failed with error -5
>
>Reason for that being an overflow in u16 e1000_nvm_info->nvm
>while doing "nvm->word_size = 1 << size;" with size == 21.
>Putting the check back in place.
>
>Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
>---
> drivers/net/igb/e1000_82575.c |    4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/net/igb/e1000_82575.c
>b/drivers/net/igb/e1000_82575.c
>index 6b256c2..5cfa37f 100644
>--- a/drivers/net/igb/e1000_82575.c
>+++ b/drivers/net/igb/e1000_82575.c
>@@ -244,6 +244,10 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>*hw)
> 	 */
> 	size += NVM_WORD_SIZE_BASE_SHIFT;
>
>+	/* EEPROM access above 16k is unsupported */
>+	if (size > 14)
>+		size = 14;
>+
> 	nvm->word_size = 1 << size;
> 	if (nvm->word_size == (1 << 15))
> 		nvm->page_size = 128;
>--
>1.7.4
NACK

This doesn't apply against current upstream RC kernel.  There was more changed in that commit than just the removal of this.  There is a missing section of code that is needed, but not this.  This starts at line 251 in e1000_82575.c

--snip--
        /* NVM Function Pointers */
        nvm->ops.acquire = igb_acquire_nvm_82575;
        if (nvm->word_size < (1 << 15))
                nvm->ops.read = igb_read_nvm_eerd;
        else
                nvm->ops.read = igb_read_nvm_spi;
--snip--

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation



^ permalink raw reply

* Re: [PATCH net-2.6] ixgbe: only enable WoL for magic packet by default
From: Jeff Kirsher @ 2011-04-08 18:13 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, martin.wilck, john.r.fastabend
In-Reply-To: <1302187315-16487-1-git-send-email-andy@greyhouse.net>

On Thu, Apr 7, 2011 at 07:41, Andy Gospodarek <andy@greyhouse.net> wrote:
> Martin Wilck <martin.wilck@ts.fujitsu.com> reported that systems using
> the ixgbe-driver that were capable of WoL were rebooting almost as soon
> as they were shut down.  This is because the default WoL settings
> enabled magic packet, broadcast, unicast, and multicast.
>
> Other Intel devices seem to use the stored eeprom value for initial WoL
> capabilities.  The 82578DM (e1000e) and 82576 (igb) the devices I looked
> at had only the magic packet enabled in the eeprom, so that seems
> appropriate on ixgbe-based devices as well.  I set the WoL options on my
> 82578DM to be the same default as the ixgbe devices (umbg) and saw the
> same as Martin -- almost as soon as my box shutdown, it booted again.
>
> This patch changes the default to only be the magic packet.  This is the
> same as the default for most Intel and non-Intel hardware currently
> upstream.
>
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> CC: Martin Wilck <martin.wilck@ts.fujitsu.com>
>

Thanks Andy! I have added the patch to my queue.

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: extending feature word.
From: Mahesh Bandewar @ 2011-04-08 18:17 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: linux-netdev, Ben Hutchings, David Miller
In-Reply-To: <20110408100535.GB10565@rere.qmqm.pl>

On Fri, Apr 8, 2011 at 3:05 AM, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
> On Fri, Apr 01, 2011 at 07:07:05PM -0700, Mahesh Bandewar wrote:
>> Thanks for your comments on my loop-back patch. I was looking at the
>> code today from the perspective of extending various "features" for
>> word to an array of words and as Michael has pointed out, it's a huge
>> change. So I'm thinking on the following lines
>> (include/linux/netdevice.h)
>>
>> +#define DEV_FEATURE_WORDS      2
>> +#define LEGACY_FEATURE_WORD    0
>>        /* currently active device features */
>> -       u32                     features;
>> +       u32                     features[DEV_FEATURE_WORDS];
>>        /* user-changeable features */
>> -       u32                     hw_features;
>> +       u32                     hw_features[DEV_FEATURE_WORDS];
>>        /* user-requested features */
>> -       u32                     wanted_features;
>> +       u32                     wanted_features[DEV_FEATURE_WORDS];
>>        /* VLAN feature mask */
>> -       u32                     vlan_features;
>> +       u32                     vlan_features[DEV_FEATURE_WORDS];
>
> Hmm. There might be no point in making features field an array.
> This gives us nothing really. Maybe just add features_2 or similar?
> If we ever get to the point there need to be more than two words for
> features we can think of some abstraction layer then.
>
That is right! making it an array doesn't really buy us anything
unless there is a uniform way of managing all the bits spread across
multiple words inside that array. This was the reason why I have
changed that array into a bitmap in the patch that I have posted
earlier. This way the upper limit (currently only 32 bits) will be
removed and we'll have a long term solution. There will be little bit
of work involved but 'doing-things-right' has cost associated.

> Or we might add a new field and put there NETIF_F_LLTX, NETIF_F_HIGHDMA
> and others that are not user changeable ever. Those don't need dynamic
> propagation to slave devices (e.g. VLAN) and wanted/hw_features for them.
>
This will certainly buy us some time but will be a temporary fix until
we runout of bits again. Also adding a second word (separate from the
first word) will create fragmentation and different approaches to
manage these two words and (I think) wont be desirable.

There will be another approach where we change this to u64 and
postpone the problem little longer and probably wait for u128 to make
it even longer. This is again a mid-term fix and not really a
solution.

In the patch that I have posted, I have changed these fiels to bitmaps
and a plan to take it there. This will _solve_ the problem once and
for all.


Thanks,
--mahesh..

> Best Regards,
> Michał Mirosław
>

^ permalink raw reply

* [PATCH net-next] cnic: Fix rtnl deadlock
From: Michael Chan @ 2011-04-08 17:44 UTC (permalink / raw)
  To: davem; +Cc: netdev

When cnic_stop_hw() -> cnic_cm_stop_bnx2x_hw() is called under rtnl_lock()
from NETDEV_DOWN event, it waits for cnic_delete_task() to complete.
It will deadlock when cnic_delete_task() takes rtnl_lock() before
calling cnic_ulp_stop_one().

We fix it by removing the rtnl_lock() in cnic_delete_task().
cnic_ulp_stop_one() has mutex and atomic bit ops to prevent important
operations from being done more than once, so it is not necessary to take
rtnl_lock().

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/cnic.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
index 5dfbff0..cde59b4 100644
--- a/drivers/net/cnic.c
+++ b/drivers/net/cnic.c
@@ -3983,9 +3983,7 @@ static void cnic_delete_task(struct work_struct *work)
 	if (test_and_clear_bit(CNIC_LCL_FL_STOP_ISCSI, &cp->cnic_local_flags)) {
 		struct drv_ctl_info info;
 
-		rtnl_lock();
 		cnic_ulp_stop_one(cp, CNIC_ULP_ISCSI);
-		rtnl_unlock();
 
 		info.cmd = DRV_CTL_ISCSI_STOPPED_CMD;
 		cp->ethdev->drv_ctl(dev->netdev, &info);
-- 
1.6.4.GIT



^ permalink raw reply related

* [PATCH] Add support for SMSC LAN9530, LAN9730 and LAN89530
From: Steve Glendinning @ 2011-04-08 18:51 UTC (permalink / raw)
  To: netdev; +Cc: Steve Glendinning

This patch adds support for SMSC's LAN9530, LAN9730 and LAN89530 USB
ethernet controllers to the existing smsc95xx driver by adding
their new USB VID/PID pairs.

Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
---
 drivers/net/usb/smsc95xx.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 727874d..47a6c87 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1313,6 +1313,21 @@ static const struct usb_device_id products[] = {
 		USB_DEVICE(0x0424, 0x9909),
 		.driver_info = (unsigned long) &smsc95xx_info,
 	},
+	{
+		/* SMSC LAN9530 USB Ethernet Device */
+		USB_DEVICE(0x0424, 0x9530),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC LAN9730 USB Ethernet Device */
+		USB_DEVICE(0x0424, 0x9730),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
+	{
+		/* SMSC LAN89530 USB Ethernet Device */
+		USB_DEVICE(0x0424, 0x9E08),
+		.driver_info = (unsigned long) &smsc95xx_info,
+	},
 	{ },		/* END */
 };
 MODULE_DEVICE_TABLE(usb, products);
-- 
1.7.2.5


^ permalink raw reply related

* Re: Kernel panic when using bridge
From: Pallai Roland @ 2011-04-08 19:12 UTC (permalink / raw)
  To: Scot Doyle; +Cc: Sebastian Nickel, netdev
In-Reply-To: <4D9F2248.90300@scotdoyle.com>

2011/4/8 Scot Doyle <lkml@scotdoyle.com>:
> Perhaps the problem is isolated to the bridging code? Neither KVM guest nor
> associated tap device were running during my second reported panic.
> Here's a similar stacktrace from a third person:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620201
Yep, I'm the third person. :)

What I can tell you my servers are stable after bridging has been eliminated.

^ permalink raw reply

* Re: Kernel panic when using bridge
From: Stephen Hemminger @ 2011-04-08 19:17 UTC (permalink / raw)
  To: Scot Doyle; +Cc: netdev
In-Reply-To: <4D9E62D9.5010400@scotdoyle.com>

On Thu, 07 Apr 2011 20:20:25 -0500
Scot Doyle <lkml@scotdoyle.com> wrote:

> This kernel panic occurs when using a bridge. I would be grateful for 
> any ideas on how to correct it.
> 
> The panic was captured on two servers after three or four days of 
> minimal use, both configured as follows:
> - unpatched kernel 2.6.39-rc1 (commit 
> ecb78ab6f30106ab72a575a25b1cdfd1633b7ca2) with default .config options
> - br0 on single intel igb NIC (3 other NIC's unused)
> - br0 with ip address on distinct /27 subnet
> - br0:1 with ip address on distinct /24 subnet
> - br0:2 with ip address on distinct /24 subnet
> - no iptables rules
> - ebtables not installed
> 
> "net/bridge/br_netfilter.c" and "net/ipv4/ip_options.c" (in the current 
> 2.6.39-rc2 and in net-next-2.6) are identical to the versions used to 
> build this kernel.
> 

Please reproduce with exactly 2.6.39-rc2 there were some bug fixes
to make sure that header was initialized.


-- 

^ permalink raw reply

* [PATCH net-next] cxgb4: drop phys_id interface and implement the newer set_phys_id
From: Dimitris Michailidis @ 2011-04-08 20:01 UTC (permalink / raw)
  To: netdev; +Cc: Dimitris Michailidis

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
---
 drivers/net/cxgb4/cxgb4_main.c |   17 +++++++++++------
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index 5352c8a..0af9c9f 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -1336,15 +1336,20 @@ static int restart_autoneg(struct net_device *dev)
 	return 0;
 }
 
-static int identify_port(struct net_device *dev, u32 data)
+static int identify_port(struct net_device *dev,
+			 enum ethtool_phys_id_state state)
 {
+	unsigned int val;
 	struct adapter *adap = netdev2adap(dev);
 
-	if (data == 0)
-		data = 2;     /* default to 2 seconds */
+	if (state == ETHTOOL_ID_ACTIVE)
+		val = 0xffff;
+	else if (state == ETHTOOL_ID_INACTIVE)
+		val = 0;
+	else
+		return -EINVAL;
 
-	return t4_identify_port(adap, adap->fn, netdev2pinfo(dev)->viid,
-				data * 5);
+	return t4_identify_port(adap, adap->fn, netdev2pinfo(dev)->viid, val);
 }
 
 static unsigned int from_fw_linkcaps(unsigned int type, unsigned int caps)
@@ -2011,7 +2016,7 @@ static struct ethtool_ops cxgb_ethtool_ops = {
 	.set_sg            = ethtool_op_set_sg,
 	.get_link          = ethtool_op_get_link,
 	.get_strings       = get_strings,
-	.phys_id           = identify_port,
+	.set_phys_id       = identify_port,
 	.nway_reset        = restart_autoneg,
 	.get_sset_count    = get_sset_count,
 	.get_ethtool_stats = get_stats,
-- 
1.7.3.4


^ permalink raw reply related

* Re: [PATCH net-next] cnic: Fix rtnl deadlock
From: David Miller @ 2011-04-08 20:03 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1302284654-25864-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Fri, 8 Apr 2011 10:44:14 -0700

> When cnic_stop_hw() -> cnic_cm_stop_bnx2x_hw() is called under rtnl_lock()
> from NETDEV_DOWN event, it waits for cnic_delete_task() to complete.
> It will deadlock when cnic_delete_task() takes rtnl_lock() before
> calling cnic_ulp_stop_one().
> 
> We fix it by removing the rtnl_lock() in cnic_delete_task().
> cnic_ulp_stop_one() has mutex and atomic bit ops to prevent important
> operations from being done more than once, so it is not necessary to take
> rtnl_lock().
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied, thanks Michael.

^ permalink raw reply

* Re: [PATCH] igb: restore EEPROM 16kB access limit
From: Stefan Assmann @ 2011-04-08 20:04 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net,
	Kirsher, Jeffrey T, Pieper, Jeffrey E, Ronciak, John
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F0137E85A8E@orsmsx508.amr.corp.intel.com>

On 08.04.2011 18:40, Wyborny, Carolyn wrote:
> 
> 
>> -----Original Message-----
>> From: Stefan Assmann [mailto:sassmann@kpanic.de]
>> Sent: Friday, April 08, 2011 6:35 AM
>> To: netdev@vger.kernel.org
>> Cc: e1000-devel@lists.sourceforge.net; Kirsher, Jeffrey T; Pieper,
>> Jeffrey E; Wyborny, Carolyn; Ronciak, John
>> Subject: [PATCH] igb: restore EEPROM 16kB access limit
>>
>> The check limiting the EEPROM access up to 16kB was removed by
>> commit 4322e561a93ec7ee034b603a6c610e7be90d4e8a. Without this check
>> the kernel will try to checksum the EEPROM up to 2MB (observed with
>> a 8086:10c9 NIC) and fail.
>>
>> igb 0000:03:00.0: 0 vfs allocated
>> igb 0000:03:00.0: The NVM Checksum Is Not Valid
>> ACPI: PCI interrupt for device 0000:03:00.0 disabled
>> igb: probe of 0000:03:00.0 failed with error -5
>>
>> Reason for that being an overflow in u16 e1000_nvm_info->nvm
>> while doing "nvm->word_size = 1 << size;" with size == 21.
>> Putting the check back in place.
>>
>> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
>> ---
>> drivers/net/igb/e1000_82575.c |    4 ++++
>> 1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/igb/e1000_82575.c
>> b/drivers/net/igb/e1000_82575.c
>> index 6b256c2..5cfa37f 100644
>> --- a/drivers/net/igb/e1000_82575.c
>> +++ b/drivers/net/igb/e1000_82575.c
>> @@ -244,6 +244,10 @@ static s32 igb_get_invariants_82575(struct e1000_hw
>> *hw)
>> 	 */
>> 	size += NVM_WORD_SIZE_BASE_SHIFT;
>>
>> +	/* EEPROM access above 16k is unsupported */
>> +	if (size > 14)
>> +		size = 14;
>> +
>> 	nvm->word_size = 1 << size;
>> 	if (nvm->word_size == (1 << 15))
>> 		nvm->page_size = 128;
>> --
>> 1.7.4
> NACK
> 
> This doesn't apply against current upstream RC kernel.  There was more changed in that commit than just the removal of this.  There is a missing section of code that is needed, but not this.  This starts at line 251 in e1000_82575.c

Carolyn,

the patch applies against latest net-next-2.6
(782d640afd15af7a1faf01cfe566ca4ac511319d).

How do you explain the behaviour observed in the patch description?

> 
> --snip--
>         /* NVM Function Pointers */
>         nvm->ops.acquire = igb_acquire_nvm_82575;
>         if (nvm->word_size < (1 << 15))
>                 nvm->ops.read = igb_read_nvm_eerd;
>         else
>                 nvm->ops.read = igb_read_nvm_spi;
> --snip--

Ok, so I assume some new NICs allow access beyond the 16k boundary.
In that case we should identify which NICs and treat them separately,
keeping the behaviour identical for the others.

  Stefan

^ permalink raw reply

* [PATCH net-next] cxgb4vf: drop phys_id interface and implement the newer set_phys_id
From: Dimitris Michailidis @ 2011-04-08 20:05 UTC (permalink / raw)
  To: netdev

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
---
 drivers/net/cxgb4vf/cxgb4vf_main.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c b/drivers/net/cxgb4vf/cxgb4vf_main.c
index 6aad64d..b0d037e 100644
--- a/drivers/net/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
@@ -1352,11 +1352,20 @@ static int cxgb4vf_set_rx_csum(struct net_device *dev, u32 csum)
 /*
  * Identify the port by blinking the port's LED.
  */
-static int cxgb4vf_phys_id(struct net_device *dev, u32 id)
+static int cxgb4vf_phys_id(struct net_device *dev,
+			   enum ethtool_phys_id_state state)
 {
+	unsigned int val;
 	struct port_info *pi = netdev_priv(dev);
 
-	return t4vf_identify_port(pi->adapter, pi->viid, 5);
+	if (state == ETHTOOL_ID_ACTIVE)
+		val = 0xffff;
+	else if (state == ETHTOOL_ID_INACTIVE)
+		val = 0;
+	else
+		return -EINVAL;
+
+	return t4vf_identify_port(pi->adapter, pi->viid, val);
 }
 
 /*
@@ -1588,7 +1597,7 @@ static struct ethtool_ops cxgb4vf_ethtool_ops = {
 	.set_sg			= ethtool_op_set_sg,
 	.get_link		= ethtool_op_get_link,
 	.get_strings		= cxgb4vf_get_strings,
-	.phys_id		= cxgb4vf_phys_id,
+	.set_phys_id		= cxgb4vf_phys_id,
 	.get_sset_count		= cxgb4vf_get_sset_count,
 	.get_ethtool_stats	= cxgb4vf_get_ethtool_stats,
 	.get_regs_len		= cxgb4vf_get_regs_len,
-- 
1.7.3.4


^ permalink raw reply related

* Re: [PATCH net-next] cxgb4: drop phys_id interface and implement the newer set_phys_id
From: David Miller @ 2011-04-08 20:06 UTC (permalink / raw)
  To: dm; +Cc: netdev
In-Reply-To: <1302292894-18629-1-git-send-email-dm@chelsio.com>

From: Dimitris Michailidis <dm@chelsio.com>
Date: Fri,  8 Apr 2011 13:01:34 -0700

> Signed-off-by: Dimitris Michailidis <dm@chelsio.com>

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox