Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: Mandeep Baines @ 2007-09-04 17:21 UTC (permalink / raw)
  To: hadi
  Cc: Daniele Venzano, davem, rick.jones2, msb, netdev, grundler,
	robert.olsson, jeff, nhorman
In-Reply-To: <1188911036.4483.10.camel@localhost>

On 9/4/07, jamal <hadi@cyberus.ca> wrote:
> On Mon, 2007-03-09 at 20:20 -0700, Mandeep Singh Baines wrote:
>
> > I didn't see much saving in interrupts on my machine (too fast, I guess).
>
> You could try the idea suggested by Dave earlier and just turn interupts
> for every nth packet. That should cut down the numbers.
>

I wanted to do that but I couldn't figure out how to free the last skb
if it happened to be a transmit that didn't generate a tx completion
interrupt.

Let's say I interrupt every 4th packet.I've already sent packets 0 to
4. Now I send packets 5 and 6. I can't figure out how to ensure that
the skb's for these packets get freed in a deterministic amount of
time if there is no interrupt? They will get freed when packet 8 gets
transmitted but there is no upper bound on when that will happen.

Maybe I could create a tx skb cleanup timer that I kickoff in
hard_start_xmit(). Every call to hard_start_xmit would reset the
timer. I could try this for a future patch. Not sure if the extra code
would cost me more than I get back in savings.

> > I did see a significant boost to tx performance by optimizing start_xmit: more
> > than double pps in pktgen.
>
> 148Kpps on a slow piece of hardware aint bad - Good Stuff. I wonder how
> much CPU is being abused.
>
> If you wanna go one extra mile (separate future patch): get rid of that
> tx lock and use netif_tx_lock on the interupt path. Look at some sane
> driver like tg3 for reference.
>

Cool. I'll check out the tg3.

^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: Mandeep Baines @ 2007-09-04 17:24 UTC (permalink / raw)
  To: Daniele Venzano
  Cc: hadi, davem, rick.jones2, msb, netdev, grundler, robert.olsson,
	jeff, nhorman
In-Reply-To: <1188925008-ced672f60b90353067426d0b9f74506a@brownhat.org>

Cool. I'll try to see if I can clock my pc lower and run the
experiments again. I'll measure cpu utilization also this time around.
That should be useful for extrapolating.

Regards,
Mandeep

On 9/4/07, Daniele Venzano <venza@brownhat.org> wrote:
> ----- Message d'origine -----
> De: Mandeep Singh Baines <mandeep.baines@gmail.com>
> Date: Mon, 3 Sep 2007 20:20:36 -0700
> Sujet: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
>
> >Hi Daniele,
> >
> >Attached is a patch for converting the sis900 driver to NAPI. Please take a
> >look at let me know what you think. I'm not 100% sure if I'm handling the
> >rotting packet issue correctly or that I have the locking right in tx_timeout.
> >These might be areas to look at closely.
> >
> >I didn't see much saving in interrupts on my machine (too fast, I guess). But
> >I still think its beneficial: pushing work out of the interrupt handler into
> >a bottom half is a good thing and we no longer need to disable interrupts
> >in start_xmit.
> >
> >I did see a significant boost to tx performance by optimizing start_xmit: more
> >than double pps in pktgen.
> >
> >I'm also attaching some test results for various iterations of development.
>
> The patch looks good and I think it can be pushed higher (-mm ?) for some wider
> testing. I don't have the hardware available to do some tests myself,
> unfortunately, but it would be similar to yours anyway.
>
> I'd like to know how this works for people with less memory and slower CPU, but any
> kind of test run will be appreciated.
>
> Thanks, bye.
>
>
> --
> Daniele Venzano
> venza@brownhat.org
>

^ permalink raw reply

* Re: [PATCH 1/2]: [NET_SCHED]: Make all rate based scheduler work with TSO.
From: Bill Fink @ 2007-09-04 17:40 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jesper Dangaard Brouer, jdb, netdev@vger.kernel.org,
	David S. Miller
In-Reply-To: <46DD867E.6020307@trash.net>

On Tue, 04 Sep 2007, Patrick McHardy wrote:

> Bill Fink wrote:
> > On Sat, 1 Sep 2007, Jesper Dangaard Brouer wrote:
> >
> >>On Sat, 1 Sep 2007, Patrick McHardy wrote:
> >>>
> >>>It still won't work properly with TSO (TBF for example already drops
> >>>oversized packets during ->enqueue), but its a good cleanup anyway.
> >>
> >>Then lets call it a cleanup of the L2T macros.  In the next step we will 
> >>fix the different schedulers, to use the ability to lookup larger sized 
> >>packets. (I did notice the TBF scheduler would drop oversized packets).
> > 
> > Hmmm.  I guess this is also why TBF doesn't seem to work with 9000 byte
> > jumbo frames.
> > 
> > [root@lang4 ~]# tc qdisc add dev eth2 root tbf rate 2gbit buffer 5000000 limit 18000
> 
> Yes, you need to specify the MTU on the command line for
> jumbo frames.

Thanks!  Works much better now, although it does slightly exceed
the specified rate.

[root@lang4 ~]# tc qdisc add dev eth2 root tbf rate 2gbit buffer 5000000 limit 18000 mtu 9000

[root@lang4 ~]# ./nuttcp-5.5.5 -w10m 192.168.88.14
 2465.6729 MB /  10.08 sec = 2051.8241 Mbps 19 %TX 13 %RX

[root@lang4 ~]# ./nuttcp-5.5.5 -M1460 -w10m 192.168.88.14
 2785.5000 MB /  10.00 sec = 2335.6569 Mbps 100 %TX 26 %RX

						-Bill

^ permalink raw reply

* Re: 2.6.23-rc4-mm1 net bitops compile error
From: Jiri Slaby @ 2007-09-04 17:53 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <20070902091416.GJ16016@stusta.de>

Adrian Bunk napsal(a):
> defconfig fails with the following error on parisc:
> 
> <--  snip  -->
> 
> ...
>   CC      net/core/gen_estimator.o
> In file included from include2/asm/bitops.h:111,
>                  from /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/net/core/gen_estimator.c:18:
> /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/asm-generic/bitops/non-atomic.h: 
> In function '__set_bit':
> /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/asm-generic/bitops/non-atomic.h:17: 
> error: implicit declaration of function 'BIT_MASK'
> /home/bunk/linux/kernel-2.6/linux-2.6.23-rc4-mm1/include/asm-generic/bitops/non-atomic.h:18: 
> error: implicit declaration of function 'BIT_WORD'
> make[3]: *** [net/core/gen_estimator.o] Error 1
> 
> <--  snip  -->
> 
> Either #include <asm/bitops.h> must become forbidden and #error or the 
> move of the #define's to include/linux/bitops.h reverted.

Just to let you know, that I'm working on the former.

thanks,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz

^ permalink raw reply

* Re: 2.6.23-rc4-mm1
From: Zach Carter @ 2007-09-04 17:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, netdev, shemminger
In-Reply-To: <20070831215822.26e1432b.akpm@linux-foundation.org>


> +ioc3-program-uart-predividers.patch
> +sky2-fe-chip-support.patch
> +sky2-use-debugfs-rename.patch
> +sky2-document-gphy_ctrl-bits.patch
> +sky2-dont-restrict-config-space-access.patch
> +sky2-advanced-error-reporting.patch
> +sky2-use-pci_config-access-functions.patch
> +sky2-use-net_device-internal-stats.patch
> +ktime_sub_ns-analog-of-ktime_add_ns.patch
> +export-reciprocal_value-for-modules.patch
> +sky2-hardware-receive-timestamp-counter.patch
> +sky2-avoid-divide-in-receive-path.patch
> +sky2-118.patch

Folks,

I've got these messages since installing 2.6.23-rc4-mm1:

sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 5 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 5 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000
printk: 4 messages suppressed.
sky2 0000:07:00.0: error interrupt status=0x80000000

The laptop is a Sony VAIO SZ430N/B

Despite the errors, the interface appears to be working well enough.

I'd be happy to supply additional information, try out patches, or 
submit a bugzilla if needed.

thanks!

-Zach

^ permalink raw reply

* Re: r8169: slow samba performance
From: john @ 2007-09-04 20:04 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Bruce Cole, netdev
In-Reply-To: <20070903210322.GB4444@electric-eye.fr.zoreil.com>


On Mon, 3 Sep 2007, Francois Romieu wrote:

> john@BlueSkyTours.com <john@BlueSkyTours.com> :
> [...]
>> I have had abysmal performance trying to remotely run X apps via ssh on a
>> computer with a RTL8111 NIC.  Saw this message and decided to give this
>> patch a try --- success!  Much, much better.
>
> Can you give a try to:
>
> http://www.fr.zoreil.com/people/francois/misc/20070903-2.6.23-rc5-r8169-test.patch
>
> or just patches #0001 + #0002 at:
>
> http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc5/r8169-20070903/


20070903-2.6.23-rc5-r8169-test.patch applied against 2.6.23-rc5 works fine.
Performance is acceptable.

Would you like me to *just* try patches 1 & 2, to help narrow down anything?

Thanks,

John


^ permalink raw reply

* [PATCH] [RFC] allow admin/users to specify rto_min in milliseconds rather than jiffies
From: Rick Jones @ 2007-09-04 20:20 UTC (permalink / raw)
  To: netdev

Build upon David Miller's initial patches to set the per-route rto_min
so users can specify the rto_min in the same units (milliseconds) in
which they are displayed.  This is desirable because asking users to
convert to and from jiffies themselves, when there can be different
values of HZ from system to system would be error prone.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
---

*** ./lib/utils.c.orig	2007-09-04 13:11:01.000000000 -0700
--- ./lib/utils.c	2007-09-04 13:06:11.000000000 -0700
***************
*** 61,66 ****
--- 61,141 ----
  	return 0;
  }
  
+ /* liberally lifted from kernel/time.c. raj */
+ unsigned int jiffies_to_msecs(const unsigned long j)
+ {
+ 	if (__iproute2_hz_internal == 0)
+ 		__iproute2_hz_internal = __get_hz();
+  
+ 	if ((__iproute2_hz_internal <= 1000) && 
+ 		!(1000 % __iproute2_hz_internal))
+ 		return (1000 / __iproute2_hz_internal) * j;
+ 	else if (__iproute2_hz_internal > 1000 && 
+ 		!(__iproute2_hz_internal % 1000))
+ 		return (j + (__iproute2_hz_internal / 1000) - 1) /
+ 			(__iproute2_hz_internal / 1000);
+ 	else
+         	return (j * 1000) / __iproute2_hz_internal;
+ }
+ 
+ /* liberally lifted from kernel/time.c  raj */
+ unsigned long msecs_to_jiffies(const unsigned int m)
+ {
+ 
+ 	if (__iproute2_hz_internal == 0)
+ 		__iproute2_hz_internal = __get_hz();
+ 
+ 	if (__iproute2_hz_internal <= 1000 && 
+ 		!(1000 % __iproute2_hz_internal))
+         	/*
+          	 * HZ is equal to or smaller than 1000, and 1000 is a nice
+          	 * round multiple of HZ, divide with the factor between them,
+          	 * but round upwards:
+          	 */
+ 		return (m + (1000 / __iproute2_hz_internal) - 1) / 
+ 			(1000 / __iproute2_hz_internal);
+ 	else if (__iproute2_hz_internal > 1000 && 
+ 		!(__iproute2_hz_internal % 1000)) {
+         	/*
+           	 * HZ is larger than 1000, and HZ is a nice round multiple of
+          	 * 1000 - simply multiply with the factor between them.
+          	 *
+          	 * But first make sure the multiplication result cannot
+          	 * overflow:
+          	 */
+         	if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+                 	return MAX_JIFFY_OFFSET;
+ 
+ 		return m * (__iproute2_hz_internal / 1000);
+ 	}
+ 	else {
+         	/*
+          	 * Generic case - multiply, round and divide. But first
+          	 * check that if we are doing a net multiplication, that
+          	 * we wouldnt overflow:
+          	 */
+         	if (__iproute2_hz_internal > 1000 && 
+ 			m > jiffies_to_msecs(MAX_JIFFY_OFFSET))
+                 	return MAX_JIFFY_OFFSET;
+ 
+ 		return (m * __iproute2_hz_internal + 1000 - 1) / 1000;
+ 	}
+ }
+ 
+ int get_unsigned_ms_to_jiffies(unsigned *val, const char *arg, int base)
+ {
+ 	int ret;
+ 	unsigned myval = *val;
+ 
+ 	if (__iproute2_hz_internal == 0)
+ 		__iproute2_hz_internal = __get_hz();
+ 	ret = get_unsigned(&myval,arg,base);
+ 	if (ret == -1 || myval > (UINT_MAX / __iproute2_hz_internal))
+ 		return -1;
+ 	*val = msecs_to_jiffies(myval);
+ 	return 0;
+ }	
+ 
  int get_u64(__u64 *val, const char *arg, int base)
  {
  	unsigned long long res;
*** ./include/utils.h.orig	2007-09-04 12:56:09.000000000 -0700
--- ./include/utils.h	2007-09-04 12:55:56.000000000 -0700
***************
*** 69,74 ****
--- 69,78 ----
  	u_int8_t  ipx_node[IPX_NODE_LEN];
  };
  
+ #ifndef MAX_JIFFY_OFFSET
+ #define MAX_JIFFY_OFFSET ((LONG_MAX >> 1)-1)
+ #endif
+ 
  extern __u32 get_addr32(const char *name);
  extern int get_addr_1(inet_prefix *dst, const char *arg, int family);
  extern int get_prefix_1(inet_prefix *dst, char *arg, int family);
***************
*** 77,82 ****
--- 81,87 ----
  
  extern int get_integer(int *val, const char *arg, int base);
  extern int get_unsigned(unsigned *val, const char *arg, int base);
+ extern int get_unsigned_ms_to_jiffies(unsigned *val, const char *arg, int base);
  #define get_byte get_u8
  #define get_ushort get_u16
  #define get_short get_s16
*** ./include/linux/rtnetlink.h.orig	2007-03-13 14:50:56.000000000 -0700
--- ./include/linux/rtnetlink.h	2007-08-31 13:27:36.000000000 -0700
***************
*** 352,357 ****
--- 352,359 ----
  #define RTAX_INITCWND RTAX_INITCWND
  	RTAX_FEATURES,
  #define RTAX_FEATURES RTAX_FEATURES
+ 	RTAX_RTO_MIN,
+ #define RTAX_RTO_MIN RTAX_RTO_MIN
  	__RTAX_MAX
  };
  
*** ./ip/iproute.c.orig	2007-08-31 13:15:45.000000000 -0700
--- ./ip/iproute.c	2007-09-04 13:02:42.000000000 -0700
***************
*** 51,56 ****
--- 51,57 ----
  	[RTAX_HOPLIMIT] = "hoplimit",
  	[RTAX_INITCWND] = "initcwnd",
  	[RTAX_FEATURES] = "features",
+ 	[RTAX_RTO_MIN]	= "rto_min",
  };
  static void usage(void) __attribute__((noreturn));
  
***************
*** 74,79 ****
--- 75,81 ----
  	fprintf(stderr, "           [ rtt NUMBER ] [ rttvar NUMBER ]\n");
  	fprintf(stderr, "           [ window NUMBER] [ cwnd NUMBER ] [ initcwnd NUMBER ]\n");
  	fprintf(stderr, "           [ ssthresh NUMBER ] [ realms REALM ]\n");
+ 	fprintf(stderr, "           [ rto_min NUMBER ]\n");
  	fprintf(stderr, "TYPE := [ unicast | local | broadcast | multicast | throw |\n");
  	fprintf(stderr, "          unreachable | prohibit | blackhole | nat ]\n");
  	fprintf(stderr, "TABLE_ID := [ local | main | default | all | NUMBER ]\n");
***************
*** 516,522 ****
  			if (mxlock & (1<<i))
  				fprintf(fp, " lock");
  
! 			if (i != RTAX_RTT && i != RTAX_RTTVAR)
  				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
  			else {
  				unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
--- 518,525 ----
  			if (mxlock & (1<<i))
  				fprintf(fp, " lock");
  
! 			if (i != RTAX_RTT && i != RTAX_RTTVAR &&
! 			    i != RTAX_RTO_MIN)
  				fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i]));
  			else {
  				unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
***************
*** 524,530 ****
  				val *= 1000;
  				if (i == RTAX_RTT)
  					val /= 8;
! 				else
  					val /= 4;
  				if (val >= hz)
  					fprintf(fp, " %ums", val/hz);
--- 527,533 ----
  				val *= 1000;
  				if (i == RTAX_RTT)
  					val /= 8;
! 				else if (i == RTAX_RTTVAR)
  					val /= 4;
  				if (val >= hz)
  					fprintf(fp, " %ums", val/hz);
***************
*** 823,828 ****
--- 826,840 ----
  			if (get_unsigned(&rtt, *argv, 0))
  				invarg("\"rtt\" value is invalid\n", *argv);
  			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_RTT, rtt);
+ 		} else if (strcmp(*argv, "rto_min") == 0) {
+ 			unsigned rto_min;
+ 			NEXT_ARG();
+ 			mxlock |= (1<<RTAX_RTO_MIN);
+ 			if (get_unsigned_ms_to_jiffies(&rto_min, *argv, 0))
+ 				invarg("\"rto_min\" value is invalid\n",
+ 				       *argv);
+ 			rta_addattr32(mxrta, sizeof(mxbuf), RTAX_RTO_MIN,
+ 				      rto_min);
  		} else if (matches(*argv, "window") == 0) {
  			unsigned win;
  			NEXT_ARG();

^ permalink raw reply

* RE: 82557/8/9 Ethernet Pro 100 interrupt mitigation support
From: Brandeburg, Jesse @ 2007-09-04 20:32 UTC (permalink / raw)
  To: John Sigler; +Cc: Kok, Auke-jan H, netdev, linux-net
In-Reply-To: <46DD8AC5.1070508@free.fr>

John Sigler wrote:
> Jesse Brandeburg wrote:
> 
>> Auke Kok wrote:
>> 
>>> Marc Sigler wrote:
>>> 
>>>> I have several systems with three integrated Intel 82559 (I
>>>> *think*). 
>>>> 
>>>> Does someone know if these boards support hardware interrupt
>>>> mitigation? I.e. is it possible to configure them to raise an IRQ
>>>> only if their hardware buffer is full OR if some given time (say 1
>>>> ms) has passed and packets are available in their hardware buffer.
>>>> 
>>>> I've been using the eepro100 driver up to now, but I'm about to try
>>>> the e100 driver. Would I have to use NAPI? Or is this an orthogonal
>>>> feature?
>>> 
>>> e100 hardware (as far as I can see from the specs) doesn't support
>>> any irq mitigation, so you'll need to run in NAPI mode if you want
>>> to throttle irq's. the in-kernel e100 already runs in NAPI mode, so
>>> that's already covered.
>>> 
>>> beware that the eepro100 driver is scheduled for removal (2.6.25 or
>>> so). 
>> 
>> We support mitigation of interrupts in a downloadable microcode on
>> only a few pieces of hardware (revision id specific) in e100.c (see
>> e100_setup_ucode)
> 
> http://lxr.linux.no/source/drivers/net/e100.c#L1176
> 
> OK.
> 
> How do I tell which revision id I have?
> 
> 00:08.0 0200: 8086:1229 (rev 08)
> 00:09.0 0200: 8086:1229 (rev 08)
> 00:0a.0 0200: 8086:1229 (rev 08)
                           ^^^^^^ 
rev 8

> How much memory is available on the board to bundle packets? 3000
> bytes? 

yes, well, 3kB.  I don't remember if the chip actually bundles
interrupts after the dma complets or pends the dma while delaying the
interrupt (I would guess it is the former)

>> If you really really wanted mitigation you could probably backport
>> the microcode from the e100 driver in the 2.4.35 kernel for your
>> specific hardware.  This driver is versioned 2.X.
> 
> I forgot to mention I'm running 2.6.22.1-rt9.
> I'm not sure why you mention 2.4.35?

because 2.4.35 has the "old" e100 driver from Intel, which has all sorts
of support (and all sorts of complexity) for offloading and interrupt
bundling, etc.

> The problem with e100 is that it fails to properly set up all three
> interfaces, which is why I'm stuck with eepro100.

I remember working with you on this, but we left the issue with no
solution because it appears your hardware has some specific issue with
strange eeprom timings that we cannot reproduce.

you can try loading the ucode (that matches your revision id) using
eepro100.

Jesse

^ permalink raw reply

* Re: r8169: slow samba performance
From: Francois Romieu @ 2007-09-04 20:37 UTC (permalink / raw)
  To: john; +Cc: Bruce Cole, netdev
In-Reply-To: <Pine.SOC.4.64.0709041402130.871@katana>

john@BlueSkyTours.com <john@BlueSkyTours.com> :
[...]
> 20070903-2.6.23-rc5-r8169-test.patch applied against 2.6.23-rc5 works fine.
> Performance is acceptable.

Does "acceptable" mean that there is a noticeable difference when compared
to the patch based on a busy-waiting loop ?

> Would you like me to *just* try patches 1 & 2, to help narrow down anything?

I expect patch #2 alone to be enough to enhance the performance. If it gets
proven, the patch would be a good candidate for a quick merge upstream.

-- 
Ueimor

^ permalink raw reply

* Re: r8169: slow samba performance
From: john @ 2007-09-04 21:00 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Bruce Cole, netdev
In-Reply-To: <20070904203742.GB31783@electric-eye.fr.zoreil.com>


On Tue, 4 Sep 2007, Francois Romieu wrote:

> john@BlueSkyTours.com <john@BlueSkyTours.com> :
> [...]
>> 20070903-2.6.23-rc5-r8169-test.patch applied against 2.6.23-rc5 works fine.
>> Performance is acceptable.
>
> Does "acceptable" mean that there is a noticeable difference when compared
> to the patch based on a busy-waiting loop ?


Without this patch, latency in bringing up emacs, or in display of pages in
firefox is extremely high.  With the patch, latency is pretty much what I
see when using an old tulip based NIC.

Is there a specific test you wish me to try?


>> Would you like me to *just* try patches 1 & 2, to help narrow down anything?
>
> I expect patch #2 alone to be enough to enhance the performance. If it gets
> proven, the patch would be a good candidate for a quick merge upstream.


Okay, I will build another kernel with just #2 applied.


John



^ permalink raw reply

* Re: 2.6.23-rc4-mm1
From: Stephen Hemminger @ 2007-09-04 21:36 UTC (permalink / raw)
  To: Zach Carter; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <200709041054.32859.linux@zachcarter.com>

On Tue, 4 Sep 2007 10:54:32 -0700
Zach Carter <linux@zachcarter.com> wrote:

> 
> > +ioc3-program-uart-predividers.patch
> > +sky2-fe-chip-support.patch
> > +sky2-use-debugfs-rename.patch
> > +sky2-document-gphy_ctrl-bits.patch
> > +sky2-dont-restrict-config-space-access.patch
> > +sky2-advanced-error-reporting.patch
> > +sky2-use-pci_config-access-functions.patch
> > +sky2-use-net_device-internal-stats.patch
> > +ktime_sub_ns-analog-of-ktime_add_ns.patch
> > +export-reciprocal_value-for-modules.patch
> > +sky2-hardware-receive-timestamp-counter.patch

I already told Andrew to please drop this last patch, because
it causes interrupt messages. It seems masking off the IRQ
in hardware doesn't prevent that interrupt!

^ permalink raw reply

* Re: r8169: slow samba performance
From: Bruce Cole @ 2007-09-04 21:51 UTC (permalink / raw)
  To: Francois Romieu; +Cc: john, netdev
In-Reply-To: <20070904203742.GB31783@electric-eye.fr.zoreil.com>

Francois Romieu wrote:
> Does "acceptable" mean that there is a noticeable difference when compared
> to the patch based on a busy-waiting loop ?
>
>   
>> Would you like me to *just* try patches 1 & 2, to help narrow down anything?
>>     
>
> I expect patch #2 alone to be enough to enhance the performance. If it gets
> proven, the patch would be a good candidate for a quick merge upstream.
>
>   
Patch #0002 looks functionally equivalent to the patch I already pointed 
folks
to and which I showed as being sufficient to address the TX queue problem.
The fix has also already been confirmed by shane, that fix being:

diff -c r/r8169.c r3/r8169-out.c
*** r/r8169.c   2007-08-18 11:54:58.000000000 -0700
--- r3/r8169-out.c      2007-09-04 14:23:49.000000000 -0700
***************
*** 2646,2651 ****
--- 2646,2655 ----
                if (netif_queue_stopped(dev) &&
                    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
                        netif_wake_queue(dev);
+               } else if (dirty_tx != tp->cur_tx) {
+                       netif_tx_lock(dev);
+                       RTL_W8(TxPoll, NPQ);
+                       netif_tx_unlock(dev);
                }
        }
  }

In any case, I've tried your latest version of the patch,
0002-r8169-workaround-against-ignored-TxPoll-writes-8168.patch, and it alone
works as well.

I'm not sure why you describe this as being an "8168 hack", given that the
problem has been seen with the 8111b chip (I have an 8111b chip on my 
gigabyte
motherboard).

Now since this change heals the TX queue stall, it would seem that the real
underlying problem involves a race condition with enqueueing to the TX queue
while the controller is processing the queue.  The ultimate fix for that 
I bet
is either to address locking at TX enqueue time, or there is a 
controller bug.
Any clarification from realtek on the necessary processing for the NPQ 
bit, or
a known controller problem?

PS: I've also received private email that this problem pertains to video
streaming (to a Kiss DVD player) not just samba or X11 traffic.  Basically
most all high-level TCP based protocols are affected it seems.  This serious
performance problem should be considered to impact a lot more than just 
samba
users.

^ permalink raw reply

* SO_BINDTODEVICE mismatch with man page & comments.
From: Ben Greear @ 2007-09-04 22:45 UTC (permalink / raw)
  To: NetDev; +Cc: Patrick McHardy

According to the comment in the net/core/sock.c code (in 2.6.20), I should be able to pass a zero
optlen to the setsockopt method for SO_BINDTODEVICE:

                 case SO_BINDTODEVICE:
                 {
                         char devname[IFNAMSIZ];

                         /* Sorry... */
                         if (!capable(CAP_NET_RAW)) {
                                 ret = -EPERM;
                                 break;
                         }

                         /* Bind this socket to a particular device like "eth0",
                          * as specified in the passed interface name. If the
                          * name is "" or the option length is zero the socket
                          * is not bound.
                          */

However, earlier in that method it returns -EINVAL if optlen is < sizeof(int).

The man page has comments similar to that in the code above.

Also, even when I get the un-bind call working with code similar to:

int z = 0;
setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, &z, sizeof(z));

The app I'm working on (Xorp) does not appear to work.  Perhaps because
the kernel does not clean up the cached route when you un-bind
as it does in the (re)bind logic?

                                 /* Remove any cached route for this socket. */
                                 sk_dst_reset(sk);

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] spidernet: fix interrupt reason recognition
From: Linas Vepstas @ 2007-09-04 23:49 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Ishizaki Kou, netdev, cbe-oss-dev
In-Reply-To: <46D7F179.5070808@garzik.org>

On Fri, Aug 31, 2007 at 06:46:17AM -0400, Jeff Garzik wrote:
> Ishizaki Kou wrote:
> >This patch solves a problem that the spidernet driver sometimes fails
> >to handle IRQ.
> >
> >The problem happens because,
> >- In Cell architecture, interrupts may arrive at an interrupt
> >  controller, even if they are masked by the setting on registers of
> >  devices. It happens when interrupt packets are sent just before 
> >  the interrupts are masked.
> >- spidernet interrupt handler compares interrupt reasons with
> >  interrupt masks, so when such interrupts occurs, spidernet interrupt
> >  handler returns IRQ_NONE.
> >- When all of interrupt handler return IRQ_NONE, linux kernel disables
> >  the IRQ and it no longer delivers interrupts to the interrupt handlers.
> >
> >spidernet doesn't work after above sequence, because it can't receive
> >interrupts.
> > 
> >This patch changes spidernet interrupt handler that it compares
> >interrupt reason with SPIDER_NET_INTX_MASK_VALUE.
> >
> >Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
> >---
> >
> >Linas-san,
> >
> >Please apply this to 2.6.23. Because this problem is sometimes happens
> >and we cannot use the ethernet port any more.
> >
> >And also, please apply the following Arnd-san's patch to fix a problem
> >that spidernet driver sometimes causes a BUG_ON at open.
> 
> Linas?  ACK?  Alive?  :)

Argh. I read the code; it looked fine. I was going to compile it and
forward it formally and etc. and then I got busy ...

Ack'ed by: Linas Vepstas <linas@austin.ibm.com>

--linas

^ permalink raw reply

* [PATCH] Remove unneeded pointer iph from ipcomp6_input() in net/ipv6/ipcomp6.c
From: Micah Gruber @ 2007-09-05  5:44 UTC (permalink / raw)
  To: linux-kernel, netdev

This trivial patch removes the unneeded pointer iph, which is never used.

Signed-off-by: Micah Gruber < micah.gruber@gmail.com>
---

--- a/net/ipv6/ipcomp6.c	2007-09-04 23:18:43.000000000 +0800
+++ b/net/ipv6/ipcomp6.c	2007-09-05 00:48:05.000000000 +0800
@@ -65,7 +65,6 @@
 static int ipcomp6_input(struct xfrm_state *x, struct sk_buff *skb)
 {
 	int err = -ENOMEM;
-	struct ipv6hdr *iph;
 	struct ipv6_comp_hdr *ipch;
 	int plen, dlen;
 	struct ipcomp_data *ipcd = x->data;
@@ -79,7 +78,6 @@
 	skb->ip_summed = CHECKSUM_NONE;
 
 	/* Remove ipcomp header and decompress original payload */
-	iph = ipv6_hdr(skb);
 	ipch = (void *)skb->data;
 	skb->transport_header = skb->network_header + sizeof(*ipch);
 	__skb_pull(skb, sizeof(*ipch));

^ permalink raw reply

* [PATCH] Remove unneeded pointer newdp from dccp_v4_request_recv_sock() in net/dccp/ipv4.c
From: Micah Gruber @ 2007-09-05  5:47 UTC (permalink / raw)
  To: linux-kernel, netdev

This trivial patch removes the unneeded pointer newdp, which is never used.

Signed-off-by: Micah Gruber <micah.gruber@gmail.com>
---

--- a/net/dccp/ipv4.c	2007-09-04 23:18:42.000000000 +0800
+++ b/net/dccp/ipv4.c	2007-09-05 00:49:54.000000000 +0800
@@ -381,7 +381,6 @@
 {
 	struct inet_request_sock *ireq;
 	struct inet_sock *newinet;
-	struct dccp_sock *newdp;
 	struct sock *newsk;
 
 	if (sk_acceptq_is_full(sk))
@@ -396,7 +395,6 @@
 
 	sk_setup_caps(newsk, dst);
 
-	newdp		   = dccp_sk(newsk);
 	newinet		   = inet_sk(newsk);
 	ireq		   = inet_rsk(req);
 	newinet->daddr	   = ireq->rmt_addr;


^ permalink raw reply

* Re: [PATCH] [RFC] allow admin/users to specify rto_min in milliseconds rather than jiffies
From: Stephen Hemminger @ 2007-09-05  6:38 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev
In-Reply-To: <200709042020.NAA29612@tardy.cup.hp.com>

On Tue, 4 Sep 2007 13:20:47 -0700 (PDT)
Rick Jones <rick.jones2@hp.com> wrote:

> Build upon David Miller's initial patches to set the per-route rto_min
> so users can specify the rto_min in the same units (milliseconds) in
> which they are displayed.  This is desirable because asking users to
> convert to and from jiffies themselves, when there can be different
> values of HZ from system to system would be error prone.
> 
> Signed-off-by: Rick Jones <rick.jones2@hp.com>
> ---

The api in netlink should be in milliseconds rather than compensating
in the application (iproute2).

^ permalink raw reply

* Re: [patch 02/18] sundance: PHY address form 0, only for device I D 0x0200 (IP100A) (20070605)
From: Jesse Huang @ 2007-09-05 18:39 UTC (permalink / raw)
  To: jeff, akpm, netdev, jesse

Dear Jeff:
We found current sundance.c in kernel 2.6.22 is working fine for IP100A.
We need not to modify following codes:
> -       for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
> +       if (sundance_pci_tbl[np->chip_id].device == 0x0200)
> +               phy = 0;
> +       else
> +               phy = 1;
> +       for (; phy <= 32 && phy_idx < MII_CNT; phy++) {


Current code will find IP100A at 0 when phy was equ to 32.
for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
		int phyx = phy & 0x1f;
		int mii_status = mdio_read(dev, phyx, MII_BMSR);
		if (mii_status != 0xffff  &&  mii_status != 0x0000) {
			np->phys[phy_idx++] = phyx;
			np->mii_if.advertising = mdio_read(dev, phyx, MII_ADVERTISE);
			if ((mii_status & 0x0040) == 0)
				np->mii_preamble_required++;
			printk(KERN_INFO "%s: MII PHY found at address %d, status "
				   "0x%4.4x advertising %4.4x.\n",
				   dev->name, phyx, mii_status, np->mii_if.advertising);
		}
	}

So, we hope you can drop this patch "[patch 02/18] sundance: PHY address
form 0, only for device I D 0x0200 (IP100A) (20070605)"

Thanks a lot!

Best Regards,
Jesse Huang

-----Original Message-----
From: Jeff Garzik [mailto:jeff@garzik.org] 
Sent: Monday, September 03, 2007 6:43 PM
To: ¶À«Ø¿³-Jesse
Cc: akpm@linux-foundation.org; netdev@vger.kernel.org
Subject: Re: [patch 02/18] sundance: PHY address form 0, only for device
I D 0x0200 (IP100A) (20070605)

¶À«Ø¿³-Jesse wrote:
> +++ a/drivers/net/sundance.c
> @@ -559,7 +559,11 @@ static int __devinit sundance_probe1 (st
>          * It seems some phys doesn't deal well with address 0 being
> accessed
>          * first, so leave address zero to the end of the loop (32 &
31).
>          */
> -       for (phy = 1; phy <= 32 && phy_idx < MII_CNT; phy++) {
> +       if (sundance_pci_tbl[np->chip_id].device == 0x0200)
> +               phy = 0;
> +       else
> +               phy = 1;
> +       for (; phy <= 32 && phy_idx < MII_CNT; phy++) {


As I noted in the last email, this patch still has a bug:

In the above loop code being modified, you need to change two things:

1) Initial loop value
2) Loop terminating condition

Your patch only performs change #1.

	Jeff




^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: Mandeep Singh Baines @ 2007-09-05  7:44 UTC (permalink / raw)
  To: Daniele Venzano
  Cc: Mandeep Singh Baines, hadi, davem, rick.jones2, msb, netdev,
	grundler, robert.olsson, jeff, nhorman
In-Reply-To: <1188925008-ced672f60b90353067426d0b9f74506a@brownhat.org>

[-- Attachment #1: Type: text/plain, Size: 1137 bytes --]

Daniele Venzano (venza@brownhat.org) wrote:
> The patch looks good and I think it can be pushed higher (-mm ?) for some wider 
> testing. I don't have the hardware available to do some tests myself, 
> unfortunately, but it would be similar to yours anyway.
> 
> I'd like to know how this works for people with less memory and slower CPU, but any 
> kind of test run will be appreciated.

Hi Daniele,

I tested the driver under different setting of CPU clock. Under a lower clock,
I get less interrupts (what I was hoping for) and slightly less performance.
Under higher clock, I get better performance and almost 1 interrupt per packet.

I also made a tiny change to the driver which resulted in slightly better
performance and less interrupts. I made a one-line change to finish_xmit,
which now wakes up the transmit queue if there are at least 16 available
slots in the tx ring. Previously, the driver would wake up the transmit queue
if there was just a single slot available. This should result in better
hysteresis.

Attached are my test results and a new patch.

Signed-off-by: Mandeep Singh Baines <mandeep.baines@gmail.com>

--

[-- Attachment #2: sis900_napi_v2.patch --]
[-- Type: text/plain, Size: 19802 bytes --]

diff --git a/drivers/net/sis900.c b/drivers/net/sis900.c
index 7c6e480..40c1aee 100644
--- a/drivers/net/sis900.c
+++ b/drivers/net/sis900.c
@@ -185,7 +185,6 @@ struct sis900_private {
 	dma_addr_t tx_ring_dma;
 	dma_addr_t rx_ring_dma;
 
-	unsigned int tx_full; /* The Tx queue is full. */
 	u8 host_bridge_rev;
 	u8 chipset_rev;
 };
@@ -202,8 +201,10 @@ MODULE_PARM_DESC(max_interrupt_work, "SiS 900/7016 maximum events handled per in
 MODULE_PARM_DESC(sis900_debug, "SiS 900/7016 bitmapped debugging message level");
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-static void sis900_poll(struct net_device *dev);
+static void sis900_poll_controller(struct net_device *dev);
 #endif
+
+static int sis900_poll(struct net_device *dev, int *budget);
 static int sis900_open(struct net_device *net_dev);
 static int sis900_mii_probe (struct net_device * net_dev);
 static void sis900_init_rxfilter (struct net_device * net_dev);
@@ -216,8 +217,8 @@ static void sis900_tx_timeout(struct net_device *net_dev);
 static void sis900_init_tx_ring(struct net_device *net_dev);
 static void sis900_init_rx_ring(struct net_device *net_dev);
 static int sis900_start_xmit(struct sk_buff *skb, struct net_device *net_dev);
-static int sis900_rx(struct net_device *net_dev);
-static void sis900_finish_xmit (struct net_device *net_dev);
+static int sis900_rx(struct net_device *net_dev, int limit);
+static int sis900_finish_xmit (struct net_device *net_dev);
 static irqreturn_t sis900_interrupt(int irq, void *dev_instance);
 static int sis900_close(struct net_device *net_dev);
 static int mii_ioctl(struct net_device *net_dev, struct ifreq *rq, int cmd);
@@ -474,9 +475,11 @@ static int __devinit sis900_probe(struct pci_dev *pci_dev,
 	net_dev->tx_timeout = sis900_tx_timeout;
 	net_dev->watchdog_timeo = TX_TIMEOUT;
 	net_dev->ethtool_ops = &sis900_ethtool_ops;
+	net_dev->poll = &sis900_poll;
+	net_dev->weight = 64;
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-        net_dev->poll_controller = &sis900_poll;
+        net_dev->poll_controller = &sis900_poll_controller;
 #endif
 
 	if (sis900_debug > 0)
@@ -979,13 +982,44 @@ static u16 sis900_reset_phy(struct net_device *net_dev, int phy_addr)
 	return status;
 }
 
+static int sis900_poll(struct net_device *dev, int *budget)
+{
+	struct sis900_private *sis_priv = dev->priv;
+	long ioaddr = dev->base_addr;
+	int limit = min_t(int, dev->quota, *budget);
+	int rx_work_done;
+	int tx_work_done;
+
+	/* run TX completion thread */
+	spin_lock(sis_priv->lock);
+	tx_work_done = sis900_finish_xmit(dev);
+	spin_unlock(sis_priv->lock);
+
+	/* run RX thread */
+	rx_work_done = sis900_rx(dev, limit);
+	*budget -= rx_work_done;
+	dev->quota -= rx_work_done;
+
+	/* re-enable interrupts if no work done */
+	if (rx_work_done + tx_work_done == 0) {
+		netif_rx_complete(dev);
+		/* Enable all known interrupts. */
+		outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr);
+		/* Handle rotting packet */
+		sis900_rx(dev, NUM_RX_DESC);
+		return 0;
+	}
+
+	return 1;
+}
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 /*
  * Polling 'interrupt' - used by things like netconsole to send skbs
  * without having to re-enable interrupts. It's not called while
  * the interrupt routine is executing.
 */
-static void sis900_poll(struct net_device *dev)
+static void sis900_poll_controller(struct net_device *dev)
 {
 	disable_irq(dev->irq);
 	sis900_interrupt(dev->irq, dev);
@@ -1032,7 +1066,7 @@ sis900_open(struct net_device *net_dev)
 	sis900_set_mode(ioaddr, HW_SPEED_10_MBPS, FDX_CAPABLE_HALF_SELECTED);
 
 	/* Enable all known interrupts by setting the interrupt mask. */
-	outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxIDLE), ioaddr + imr);
+	outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr);
 	outl(RxENA | inl(ioaddr + cr), ioaddr + cr);
 	outl(IE, ioaddr + ier);
 
@@ -1102,7 +1136,6 @@ sis900_init_tx_ring(struct net_device *net_dev)
 	long ioaddr = net_dev->base_addr;
 	int i;
 
-	sis_priv->tx_full = 0;
 	sis_priv->dirty_tx = sis_priv->cur_tx = 0;
 
 	for (i = 0; i < NUM_TX_DESC; i++) {
@@ -1517,7 +1550,6 @@ static void sis900_tx_timeout(struct net_device *net_dev)
 {
 	struct sis900_private *sis_priv = net_dev->priv;
 	long ioaddr = net_dev->base_addr;
-	unsigned long flags;
 	int i;
 
 	if(netif_msg_tx_err(sis_priv))
@@ -1527,8 +1559,8 @@ static void sis900_tx_timeout(struct net_device *net_dev)
 	/* Disable interrupts by clearing the interrupt mask. */
 	outl(0x0000, ioaddr + imr);
 
-	/* use spinlock to prevent interrupt handler accessing buffer ring */
-	spin_lock_irqsave(&sis_priv->lock, flags);
+	/* use spinlock to prevent bh from accessing buffer ring */
+	spin_lock_bh(&sis_priv->lock);
 
 	/* discard unsent packets */
 	sis_priv->dirty_tx = sis_priv->cur_tx = 0;
@@ -1546,10 +1578,9 @@ static void sis900_tx_timeout(struct net_device *net_dev)
 			sis_priv->stats.tx_dropped++;
 		}
 	}
-	sis_priv->tx_full = 0;
 	netif_wake_queue(net_dev);
 
-	spin_unlock_irqrestore(&sis_priv->lock, flags);
+	spin_unlock_bh(&sis_priv->lock);
 
 	net_dev->trans_start = jiffies;
 
@@ -1557,7 +1588,7 @@ static void sis900_tx_timeout(struct net_device *net_dev)
 	outl(sis_priv->tx_ring_dma, ioaddr + txdp);
 
 	/* Enable all known interrupts by setting the interrupt mask. */
-	outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxIDLE), ioaddr + imr);
+	outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr);
 	return;
 }
 
@@ -1574,52 +1605,27 @@ static void sis900_tx_timeout(struct net_device *net_dev)
 static int
 sis900_start_xmit(struct sk_buff *skb, struct net_device *net_dev)
 {
-	struct sis900_private *sis_priv = net_dev->priv;
 	long ioaddr = net_dev->base_addr;
-	unsigned int  entry;
-	unsigned long flags;
-	unsigned int  index_cur_tx, index_dirty_tx;
-	unsigned int  count_dirty_tx;
+	struct sis900_private *sis_priv = net_dev->priv;
+        unsigned int entry;
 
-	/* Don't transmit data before the complete of auto-negotiation */
-	if(!sis_priv->autong_complete){
+	/* Don't transmit data before auto-negotiation is complete */
+	if(unlikely(!sis_priv->autong_complete)){
 		netif_stop_queue(net_dev);
 		return 1;
 	}
 
-	spin_lock_irqsave(&sis_priv->lock, flags);
-
-	/* Calculate the next Tx descriptor entry. */
-	entry = sis_priv->cur_tx % NUM_TX_DESC;
+	/* Set the Tx descriptor and enable Transmit State Machine */
+        entry = sis_priv->cur_tx++ % NUM_TX_DESC;
 	sis_priv->tx_skbuff[entry] = skb;
-
-	/* set the transmit buffer descriptor and enable Transmit State Machine */
 	sis_priv->tx_ring[entry].bufptr = pci_map_single(sis_priv->pci_dev,
 		skb->data, skb->len, PCI_DMA_TODEVICE);
 	sis_priv->tx_ring[entry].cmdsts = (OWN | skb->len);
 	outl(TxENA | inl(ioaddr + cr), ioaddr + cr);
 
-	sis_priv->cur_tx ++;
-	index_cur_tx = sis_priv->cur_tx;
-	index_dirty_tx = sis_priv->dirty_tx;
-
-	for (count_dirty_tx = 0; index_cur_tx != index_dirty_tx; index_dirty_tx++)
-		count_dirty_tx ++;
-
-	if (index_cur_tx == index_dirty_tx) {
-		/* dirty_tx is met in the cycle of cur_tx, buffer full */
-		sis_priv->tx_full = 1;
-		netif_stop_queue(net_dev);
-	} else if (count_dirty_tx < NUM_TX_DESC) {
-		/* Typical path, tell upper layer that more transmission is possible */
-		netif_start_queue(net_dev);
-	} else {
-		/* buffer full, tell upper layer no more transmission */
-		sis_priv->tx_full = 1;
+        /* If Tx ring is full, tell upper layer no more transmission */
+	if (unlikely(sis_priv->cur_tx - sis_priv->dirty_tx >= NUM_TX_DESC))
 		netif_stop_queue(net_dev);
-	}
-
-	spin_unlock_irqrestore(&sis_priv->lock, flags);
 
 	net_dev->trans_start = jiffies;
 
@@ -1650,32 +1656,27 @@ static irqreturn_t sis900_interrupt(int irq, void *dev_instance)
 	u32 status;
 	unsigned int handled = 0;
 
-	spin_lock (&sis_priv->lock);
-
-	do {
-		status = inl(ioaddr + isr);
-
-		if ((status & (HIBERR|TxURN|TxERR|TxIDLE|RxORN|RxERR|RxOK)) == 0)
-			/* nothing intresting happened */
-			break;
+	while ((status = inl(ioaddr + isr))) {
 		handled = 1;
 
-		/* why dow't we break after Tx/Rx case ?? keyword: full-duplex */
-		if (status & (RxORN | RxERR | RxOK))
-			/* Rx interrupt */
-			sis900_rx(net_dev);
+		/* why don't we break after Tx/Rx case ?? 
+		 * keyword: full-duplex 
+		 */
 
-		if (status & (TxURN | TxERR | TxIDLE))
-			/* Tx interrupt */
-			sis900_finish_xmit(net_dev);
+		/* Rx interrupt */
+                if (status & (TxURN|TxERR|TxOK|RxORN|RxERR|RxOK)) {
+			/* Disable all interrupts. */
+			outl(0, ioaddr + imr);
+			netif_rx_schedule(net_dev);
+		}
 
 		/* something strange happened !!! */
-		if (status & HIBERR) {
+		if (status & HIBERR)
 			if(netif_msg_intr(sis_priv))
 				printk(KERN_INFO "%s: Abnormal interrupt,"
-					"status %#8.8x.\n", net_dev->name, status);
-			break;
-		}
+					"status %#8.8x.\n", 
+					net_dev->name, status);
+
 		if (--boguscnt < 0) {
 			if(netif_msg_intr(sis_priv))
 				printk(KERN_INFO "%s: Too much work at interrupt, "
@@ -1683,14 +1684,13 @@ static irqreturn_t sis900_interrupt(int irq, void *dev_instance)
 					net_dev->name, status);
 			break;
 		}
-	} while (1);
+	}
 
 	if(netif_msg_intr(sis_priv))
 		printk(KERN_DEBUG "%s: exiting interrupt, "
 		       "interrupt status = 0x%#8.8x.\n",
 		       net_dev->name, inl(ioaddr + isr));
 
-	spin_unlock (&sis_priv->lock);
 	return IRQ_RETVAL(handled);
 }
 
@@ -1704,25 +1704,29 @@ static irqreturn_t sis900_interrupt(int irq, void *dev_instance)
  *	don't do "too much" work here
  */
 
-static int sis900_rx(struct net_device *net_dev)
+static int sis900_rx(struct net_device *net_dev, int limit)
 {
 	struct sis900_private *sis_priv = net_dev->priv;
+	struct net_device_stats *stats = &sis_priv->stats;
 	long ioaddr = net_dev->base_addr;
-	unsigned int entry = sis_priv->cur_rx % NUM_RX_DESC;
-	u32 rx_status = sis_priv->rx_ring[entry].cmdsts;
-	int rx_work_limit;
+	int cur_rx = sis_priv->cur_rx;
+	int dirty_rx, orig_dirty_rx = sis_priv->dirty_rx;
+	int count;
 
 	if (netif_msg_rx_status(sis_priv))
-		printk(KERN_DEBUG "sis900_rx, cur_rx:%4.4d, dirty_rx:%4.4d "
-		       "status:0x%8.8x\n",
-		       sis_priv->cur_rx, sis_priv->dirty_rx, rx_status);
-	rx_work_limit = sis_priv->dirty_rx + NUM_RX_DESC - sis_priv->cur_rx;
+		printk(KERN_DEBUG "sis900_rx, cur_rx:%4.4d, dirty_rx:%4.4d\n",
+		       cur_rx, orig_dirty_rx);
 
-	while (rx_status & OWN) {
+	for (count = 0; count < limit; cur_rx++, count++) {
+		unsigned int entry;
+		u32 rx_status;
 		unsigned int rx_size;
 		unsigned int data_size;
 
-		if (--rx_work_limit < 0)
+		entry = cur_rx % NUM_RX_DESC;
+		rx_status = sis_priv->rx_ring[entry].cmdsts;
+
+		if ((rx_status & OWN) == 0)
 			break;
 
 		data_size = rx_status & DSIZE;
@@ -1735,113 +1739,71 @@ static int sis900_rx(struct net_device *net_dev)
 #endif
 
 		if (rx_status & (ABORT|OVERRUN|TOOLONG|RUNT|RXISERR|CRCERR|FAERR)) {
-			/* corrupted packet received */
-			if (netif_msg_rx_err(sis_priv))
-				printk(KERN_DEBUG "%s: Corrupted packet "
-				       "received, buffer status = 0x%8.8x/%d.\n",
-				       net_dev->name, rx_status, data_size);
-			sis_priv->stats.rx_errors++;
+			pci_unmap_single(sis_priv->pci_dev,
+				sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE,
+				PCI_DMA_FROMDEVICE);
+			dev_kfree_skb(sis_priv->rx_skbuff[entry]);
+
+			stats->rx_errors++;
 			if (rx_status & OVERRUN)
-				sis_priv->stats.rx_over_errors++;
+				stats->rx_over_errors++;
 			if (rx_status & (TOOLONG|RUNT))
-				sis_priv->stats.rx_length_errors++;
+				stats->rx_length_errors++;
 			if (rx_status & (RXISERR | FAERR))
-				sis_priv->stats.rx_frame_errors++;
+				stats->rx_frame_errors++;
 			if (rx_status & CRCERR)
-				sis_priv->stats.rx_crc_errors++;
-			/* reset buffer descriptor state */
-			sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
+				stats->rx_crc_errors++;
 		} else {
 			struct sk_buff * skb;
-			struct sk_buff * rx_skb;
 
 			pci_unmap_single(sis_priv->pci_dev,
 				sis_priv->rx_ring[entry].bufptr, RX_BUF_SIZE,
 				PCI_DMA_FROMDEVICE);
 
-			/* refill the Rx buffer, what if there is not enought
-			 * memory for new socket buffer ?? */
-			if ((skb = dev_alloc_skb(RX_BUF_SIZE)) == NULL) {
-				/*
-				 * Not enough memory to refill the buffer
-				 * so we need to recycle the old one so
-				 * as to avoid creating a memory hole
-				 * in the rx ring
-				 */
-				skb = sis_priv->rx_skbuff[entry];
-				sis_priv->stats.rx_dropped++;
-				goto refill_rx_ring;
-			}	
-
-			/* This situation should never happen, but due to
-			   some unknow bugs, it is possible that
-			   we are working on NULL sk_buff :-( */
-			if (sis_priv->rx_skbuff[entry] == NULL) {
-				if (netif_msg_rx_err(sis_priv))
-					printk(KERN_WARNING "%s: NULL pointer "
-					      "encountered in Rx ring\n"
-					      "cur_rx:%4.4d, dirty_rx:%4.4d\n",
-					      net_dev->name, sis_priv->cur_rx,
-					      sis_priv->dirty_rx);
-				break;
-			}
-
 			/* give the socket buffer to upper layers */
-			rx_skb = sis_priv->rx_skbuff[entry];
-			skb_put(rx_skb, rx_size);
-			rx_skb->protocol = eth_type_trans(rx_skb, net_dev);
-			netif_rx(rx_skb);
+			skb = sis_priv->rx_skbuff[entry];
+			skb_put(skb, rx_size);
+			skb->protocol = eth_type_trans(skb, net_dev);
+			netif_receive_skb(skb);
 
 			/* some network statistics */
 			if ((rx_status & BCAST) == MCAST)
-				sis_priv->stats.multicast++;
+				stats->multicast++;
 			net_dev->last_rx = jiffies;
-			sis_priv->stats.rx_bytes += rx_size;
-			sis_priv->stats.rx_packets++;
-			sis_priv->dirty_rx++;
-refill_rx_ring:
-			sis_priv->rx_skbuff[entry] = skb;
-			sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
-                	sis_priv->rx_ring[entry].bufptr =
-				pci_map_single(sis_priv->pci_dev, skb->data,
-					RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+			stats->rx_bytes += rx_size;
+			stats->rx_packets++;
 		}
-		sis_priv->cur_rx++;
-		entry = sis_priv->cur_rx % NUM_RX_DESC;
-		rx_status = sis_priv->rx_ring[entry].cmdsts;
-	} // while
+	}
 
-	/* refill the Rx buffer, what if the rate of refilling is slower
-	 * than consuming ?? */
-	for (; sis_priv->cur_rx != sis_priv->dirty_rx; sis_priv->dirty_rx++) {
+	/* refill the Rx ring. */
+	for (dirty_rx = orig_dirty_rx; dirty_rx < cur_rx; dirty_rx++) {
 		struct sk_buff *skb;
+		unsigned int entry;
 
-		entry = sis_priv->dirty_rx % NUM_RX_DESC;
-
-		if (sis_priv->rx_skbuff[entry] == NULL) {
-			if ((skb = dev_alloc_skb(RX_BUF_SIZE)) == NULL) {
-				/* not enough memory for skbuff, this makes a
-				 * "hole" on the buffer ring, it is not clear
-				 * how the hardware will react to this kind
-				 * of degenerated buffer */
-				if (netif_msg_rx_err(sis_priv))
-					printk(KERN_INFO "%s: Memory squeeze,"
-						"deferring packet.\n",
-						net_dev->name);
-				sis_priv->stats.rx_dropped++;
-				break;
-			}
-			sis_priv->rx_skbuff[entry] = skb;
-			sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
-                	sis_priv->rx_ring[entry].bufptr =
-				pci_map_single(sis_priv->pci_dev, skb->data,
-					RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+		if ((skb = dev_alloc_skb(RX_BUF_SIZE)) == NULL) {
+			if (netif_msg_rx_err(sis_priv))
+				printk(KERN_INFO "%s: Memory squeeze,"
+					"deferring packet.\n",
+					net_dev->name);
+			break;
 		}
+
+		entry = dirty_rx % NUM_RX_DESC;
+		sis_priv->rx_skbuff[entry] = skb;
+		sis_priv->rx_ring[entry].cmdsts = RX_BUF_SIZE;
+               	sis_priv->rx_ring[entry].bufptr =
+			pci_map_single(sis_priv->pci_dev, skb->data,
+				RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
 	}
+
 	/* re-enable the potentially idle receive state matchine */
-	outl(RxENA | inl(ioaddr + cr), ioaddr + cr );
+	if (dirty_rx != orig_dirty_rx)
+		outl(RxENA | inl(ioaddr + cr), ioaddr + cr );
 
-	return 0;
+	sis_priv->dirty_rx = dirty_rx;
+	sis_priv->cur_rx = cur_rx;
+
+	return count;
 }
 
 /**
@@ -1854,24 +1816,28 @@ refill_rx_ring:
  *	don't do "too much" work here
  */
 
-static void sis900_finish_xmit (struct net_device *net_dev)
+static int sis900_finish_xmit (struct net_device *net_dev)
 {
 	struct sis900_private *sis_priv = net_dev->priv;
+	struct net_device_stats *stats = &sis_priv->stats;
+	int cur_tx = sis_priv->cur_tx;
+	int orig_dirty_tx = sis_priv->dirty_tx;
+	int dirty_tx;
 
-	for (; sis_priv->dirty_tx != sis_priv->cur_tx; sis_priv->dirty_tx++) {
+	for (dirty_tx = orig_dirty_tx; dirty_tx < cur_tx; dirty_tx++) {
 		struct sk_buff *skb;
 		unsigned int entry;
 		u32 tx_status;
 
-		entry = sis_priv->dirty_tx % NUM_TX_DESC;
+		entry = dirty_tx % NUM_TX_DESC;
 		tx_status = sis_priv->tx_ring[entry].cmdsts;
 
-		if (tx_status & OWN) {
-			/* The packet is not transmitted yet (owned by hardware) !
-			 * Note: the interrupt is generated only when Tx Machine
-			 * is idle, so this is an almost impossible case */
+		/* The packet is not transmitted yet (owned by hardware) !
+		 * Note: the interrupt is generated only when Tx Machine
+		 * is idle, so this is an almost impossible case 
+		 */
+		if (tx_status & OWN)
 			break;
-		}
 
 		if (tx_status & (ABORT | UNDERRUN | OWCOLL)) {
 			/* packet unsuccessfully transmitted */
@@ -1879,20 +1845,20 @@ static void sis900_finish_xmit (struct net_device *net_dev)
 				printk(KERN_DEBUG "%s: Transmit "
 				       "error, Tx status %8.8x.\n",
 				       net_dev->name, tx_status);
-			sis_priv->stats.tx_errors++;
+			stats->tx_errors++;
 			if (tx_status & UNDERRUN)
-				sis_priv->stats.tx_fifo_errors++;
+				stats->tx_fifo_errors++;
 			if (tx_status & ABORT)
-				sis_priv->stats.tx_aborted_errors++;
+				stats->tx_aborted_errors++;
 			if (tx_status & NOCARRIER)
-				sis_priv->stats.tx_carrier_errors++;
+				stats->tx_carrier_errors++;
 			if (tx_status & OWCOLL)
-				sis_priv->stats.tx_window_errors++;
+				stats->tx_window_errors++;
 		} else {
 			/* packet successfully transmitted */
-			sis_priv->stats.collisions += (tx_status & COLCNT) >> 16;
-			sis_priv->stats.tx_bytes += tx_status & DSIZE;
-			sis_priv->stats.tx_packets++;
+			stats->collisions += (tx_status & COLCNT) >> 16;
+			stats->tx_bytes += tx_status & DSIZE;
+			stats->tx_packets++;
 		}
 		/* Free the original skb. */
 		skb = sis_priv->tx_skbuff[entry];
@@ -1905,13 +1871,16 @@ static void sis900_finish_xmit (struct net_device *net_dev)
 		sis_priv->tx_ring[entry].cmdsts = 0;
 	}
 
-	if (sis_priv->tx_full && netif_queue_stopped(net_dev) &&
-	    sis_priv->cur_tx - sis_priv->dirty_tx < NUM_TX_DESC - 4) {
-		/* The ring is no longer full, clear tx_full and schedule
-		 * more transmission by netif_wake_queue(net_dev) */
-		sis_priv->tx_full = 0;
+	/* The ring is no longer full, schedule
+	 * more transmission by netif_wake_queue(net_dev) 
+	 */
+	if (netif_queue_stopped(net_dev) && 
+	    (cur_tx - dirty_tx < NUM_TX_DESC - 16))
 		netif_wake_queue (net_dev);
-	}
+
+	sis_priv->dirty_tx = dirty_tx;
+
+	return dirty_tx - orig_dirty_tx;
 }
 
 /**
@@ -2462,7 +2431,7 @@ static int sis900_resume(struct pci_dev *pci_dev)
 	sis900_set_mode(ioaddr, HW_SPEED_10_MBPS, FDX_CAPABLE_HALF_SELECTED);
 
 	/* Enable all known interrupts by setting the interrupt mask. */
-	outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxIDLE), ioaddr + imr);
+	outl((RxSOVR|RxORN|RxERR|RxOK|TxURN|TxERR|TxOK), ioaddr + imr);
 	outl(RxENA | inl(ioaddr + cr), ioaddr + cr);
 	outl(IE, ioaddr + ier);
 
diff --git a/drivers/net/sis900.h b/drivers/net/sis900.h
index 150511a..671af28 100644
--- a/drivers/net/sis900.h
+++ b/drivers/net/sis900.h
@@ -319,8 +319,8 @@ enum sis630_revision_id {
 #define TX_BUF_SIZE     (MAX_FRAME_SIZE+18)
 #define RX_BUF_SIZE     (MAX_FRAME_SIZE+18)
 
-#define NUM_TX_DESC     16      	/* Number of Tx descriptor registers. */
-#define NUM_RX_DESC     16       	/* Number of Rx descriptor registers. */
+#define NUM_TX_DESC     64    	/* Number of Tx descriptor registers. */
+#define NUM_RX_DESC     64    	/* Number of Rx descriptor registers. */
 #define TX_TOTAL_SIZE	NUM_TX_DESC*sizeof(BufferDesc)
 #define RX_TOTAL_SIZE	NUM_RX_DESC*sizeof(BufferDesc)
 

[-- Attachment #3: sis900_testingv2.txt --]
[-- Type: text/plain, Size: 2025 bytes --]

Tested using pktgen.

All test were run on the same H/W. The CPU clock was changed from the BIOS
and the machine rebooted before each iteration.

Results in pps. Sending 4000000 60-byte packets.

Iteration 0 (under-clocked 1052.476 MHz):
Cpu(s):  0.3%us, 13.6%sy,  0.0%ni,  0.0%id,  0.0%wa, 31.2%hi, 54.8%si,  0.0%st
Result: OK: 28910148(c28791584+d118564) usec, 4000000 (60byte,0frags)
  138359pps 66Mb/sec (66412320bps) errors: 0
Interrupts: 3234740

Iteration 1 (normal 1397.657 MHz):
Cpu(s):  0.3%us, 20.9%sy,  0.0%ni,  0.0%id,  0.0%wa, 29.9%hi, 48.8%si,  0.0%st
Result: OK: 26947273(c22637342+d4309931) usec, 4000000 (60byte,0frags)
  148438pps 71Mb/sec (71250240bps) errors: 0
Interrupts: 3998176

Iteration 2 (over-clocked 1575.819 MHz):
Cpu(s):  0.3%us, 33.0%sy,  0.0%ni,  0.0%id,  0.0%wa, 27.3%hi, 39.3%si,  0.0%st
Result: OK: 26937148(c21656005+d5281143) usec, 4000000 (60byte,0frags)
  148493pps 71Mb/sec (71276640bps) errors: 0
Interrupts: 3999634

The next few iterations are with a change to the driver. Modified finish_xmit
to only wake the transmit queue when there are least 16 free spots in the tx
ring. Previously, the driver would wake the transmit queue when there was at
least 1 free spot in the tx ring. This should add some hysteresis.

Iteration 3 (under-clocked 1052.476 MHz):
Cpu(s):  0.3%us, 16.3%sy,  0.0%ni,  0.0%id,  0.0%wa, 30.0%hi, 53.3%si,  0.0%st
Result: OK: 28246751(c28169436+d77315) usec, 4000000 (60byte,0frags)
  141609pps 67Mb/sec (67972320bps) errors: 0
Interrupts: 3227925

Iteration 4 (normal 1397.657 MHz):
Cpu(s):  0.3%us, 23.7%sy,  0.0%ni,  0.0%id,  0.0%wa, 30.0%hi, 46.0%si,  0.0%st
Result: OK: 26935554(c25058872+d1876682) usec, 4000000 (60byte,0frags)
  148502pps 71Mb/sec (71280960bps) errors: 0
Interrupts: 3994491

Iteration 5 (over-clocked 1575.819 MHz):
Cpu(s):  0.3%us, 30.8%sy,  0.0%ni,  0.0%id,  0.0%wa, 27.2%hi, 41.7%si,  0.0%st
Result: OK: 26933751(c23148154+d3785597) usec, 4000000 (60byte,0frags)
  148512pps 71Mb/sec (71285760bps) errors: 0
Interrupts: 3999595


^ permalink raw reply related

* Re: [PATCH 1/2]: [NET_SCHED]: Make all rate based scheduler work with TSO.
From: Jesper Dangaard Brouer @ 2007-09-05  9:17 UTC (permalink / raw)
  To: Bill Fink; +Cc: Jesper Dangaard Brouer, netdev@vger.kernel.org
In-Reply-To: <20070904134015.18a3e7ca.billfink@mindspring.com>

On Tue, 2007-09-04 at 13:40 -0400, Bill Fink wrote:
> On Tue, 04 Sep 2007, Patrick McHardy wrote:
> 
> > Bill Fink wrote:
> > > On Sat, 1 Sep 2007, Jesper Dangaard Brouer wrote:
> > >
> > Yes, you need to specify the MTU on the command line for
> > jumbo frames.
> 
> Thanks!  Works much better now, although it does slightly exceed
> the specified rate.

Thats what happens, with the current rate table system, as we use the
lower boundry (when doing the packet to time lookups). Especially with a
high MTU, as the "resolution" of the rate table diminish (mpu=9000 gives
cell_log=6, 2^6=64 bytes "resolution" buckets).

> [root@lang4 ~]# tc qdisc add dev eth2 root tbf rate 2gbit buffer 5000000 limit 18000 mtu 9000
> 
> [root@lang4 ~]# ./nuttcp-5.5.5 -w10m 192.168.88.14
>  2465.6729 MB /  10.08 sec = 2051.8241 Mbps 19 %TX 13 %RX
> 
> [root@lang4 ~]# ./nuttcp-5.5.5 -M1460 -w10m 192.168.88.14
>  2785.5000 MB /  10.00 sec = 2335.6569 Mbps 100 %TX 26 %RX

BTW, thats a fast connection you are having! :-)

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk

^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: James Chapman @ 2007-09-05 12:03 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Daniele Venzano, hadi, davem, rick.jones2, msb, netdev, grundler,
	robert.olsson, jeff, nhorman
In-Reply-To: <20070905074411.GA4815@ludhiana>

Mandeep Singh Baines wrote:
> Daniele Venzano (venza@brownhat.org) wrote:
>> The patch looks good and I think it can be pushed higher (-mm ?) for some wider 
>> testing. I don't have the hardware available to do some tests myself, 
>> unfortunately, but it would be similar to yours anyway.
>>
>> I'd like to know how this works for people with less memory and slower CPU, but any 
>> kind of test run will be appreciated.
> 
> Hi Daniele,
> 
> I tested the driver under different setting of CPU clock. Under a lower clock,
> I get less interrupts (what I was hoping for) and slightly less performance.
> Under higher clock, I get better performance and almost 1 interrupt per packet.

I have a patch that solves the high interrupt rate problem by keeping 
the driver in polled mode longer. It's written for the latest NAPI 
version that DaveM posted recently. I'll try to get some time to write 
it up and post it for comment.

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


^ permalink raw reply

* Re: Oops in 2.6.22.1: skb_copy_and_csum_datagram_iovec()
From: Neil Brown @ 2007-09-05 12:05 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Chuck Ebbert, netdev
In-Reply-To: <E1INkpM-00007x-00@gondolin.me.apana.org.au>

On Wednesday August 22, herbert@gondor.apana.org.au wrote:
> Chuck Ebbert <cebbert@redhat.com> wrote:
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253290
> > 
> > 18:57:54 osama kernel:  [<c05be67f>] kernel_recvmsg+0x31/0x40
> > 18:57:54 osama kernel:  [<e0bc52d4>] svc_udp_recvfrom+0x114/0x368 [sunrpc]
> 
> svc_udp_recvfrom is calling kernel_recvmsg with iov == NULL.

iov == NULL used to work.

I think it stopped working at 
   commit 759e5d006462d53fb708daa8284b4ad909415da1

Previously, as len==0, MSG_TRUNC would get set, so copy_only would get
set, so skb_copy_datagram_iovec would get called, and that handles a
len of 0.

Now, skb_copy_and_csum_datagram_iovec gets called unless
skb_csum_unnecessary(skb), which now kills us.

We could 'fix' it by making skb_copy_and_csum_datagram_iovec just
return if len==0, or don't call it from udp_recvmsg in that case.

The reason that I call kernel_recvmsg with iov == NULL is that I want
to collect the source/dest addresses of the packet, but I don't want
to actually read the data - I want to just get a reference to the skb
(to avoid needless copy where possible) later.
So I call kernel_recvmsg with MSG_PEEK, and don't bother with an
iovec.

I guess a could send an empty iovec rather than a NULL, but it used to
work.... 

NeilBrown

^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: jamal @ 2007-09-05 12:33 UTC (permalink / raw)
  To: James Chapman
  Cc: Mandeep Singh Baines, Daniele Venzano, davem, rick.jones2, msb,
	netdev, grundler, robert.olsson, jeff, nhorman
In-Reply-To: <46DE9B09.7060603@katalix.com>

On Wed, 2007-05-09 at 13:03 +0100, James Chapman wrote:

> I have a patch that solves the high interrupt rate problem by keeping 
> the driver in polled mode longer. It's written for the latest NAPI 
> version that DaveM posted recently. I'll try to get some time to write 
> it up and post it for comment.

Theres interesting challenges to be tackled with resolving that. 
Just so you dont reinvent the wheel or to help invent a better one;
please read:
http://kernel.org/pub/linux/kernel/people/hadi/docs/UKUUG2005.pdf

if you have, what are the differences in your approach - and when
do you find it useful?

cheers,
jamal

PS:- Mandeep, i was going to suggest thresholds but you beat me to it
with your latest patch.

^ permalink raw reply

* Re: Oops in 2.6.22.1: skb_copy_and_csum_datagram_iovec()
From: Neil Brown @ 2007-09-05 12:50 UTC (permalink / raw)
  To: Herbert Xu, Chuck Ebbert, netdev
In-Reply-To: <18142.39804.387020.852675@notabene.brown>

On Wednesday September 5, neilb@suse.de wrote:
> On Wednesday August 22, herbert@gondor.apana.org.au wrote:
> > Chuck Ebbert <cebbert@redhat.com> wrote:
> > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253290
> > > 
> > > 18:57:54 osama kernel:  [<c05be67f>] kernel_recvmsg+0x31/0x40
> > > 18:57:54 osama kernel:  [<e0bc52d4>] svc_udp_recvfrom+0x114/0x368 [sunrpc]
> > 
> > svc_udp_recvfrom is calling kernel_recvmsg with iov == NULL.
> 
> iov == NULL used to work.
> 
> I think it stopped working at 
>    commit 759e5d006462d53fb708daa8284b4ad909415da1
> 
> Previously, as len==0, MSG_TRUNC would get set, so copy_only would get
> set, so skb_copy_datagram_iovec would get called, and that handles a
> len of 0.
> 
> Now, skb_copy_and_csum_datagram_iovec gets called unless
> skb_csum_unnecessary(skb), which now kills us.

Actually, the new code is broken for more reasons than that.
In core/datagram.c, the comment for skb_copy_and_csum_datagram_iovec,
it says:
 *	Caller _must_ check that skb will fit to this iovec.

but udp_recvmsg doesn't.

It seems to try:
	if (copied < ulen || UDP_SKB_CB(skb)->partial_cov) {
		if (udp_lib_checksum_complete(skb))
			goto csum_copy_err;
	}

	if (skb_csum_unnecessary(skb))
		err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
					      msg->msg_iov, copied       );

so it doesn't call skb_copy_datagram_iovec if "copied < ulen".

However earlier there is:
	ulen = skb->len - sizeof(struct udphdr);
	copied = len;
	if (copied > ulen)
		copied = ulen;

so if the 'len' (of the iovec) is too small, we end up with "copied == ulen", 

so udp_lib_checksum_complete doesn't get called....


> 
> We could 'fix' it by making skb_copy_and_csum_datagram_iovec just
> return if len==0, or don't call it from udp_recvmsg in that case.
> 

So the latter of these is needed.

NeilBrown

^ permalink raw reply

* Re: Fwd: That whole "Linux stealing our code" thing
From: David H. Lynch Jr. @ 2007-09-05 13:22 UTC (permalink / raw)
  To: netdev
In-Reply-To: <Pine.LNX.4.64.0709021754250.6772@condmat1.ciencias.uniovi.es>

Igor Sobrado wrote:
>>
>> There is definite value in sharing the ath5k HAL between OpenBSD and 
>> Linux.
>
> Of course. Sharing knowledge and efforts can only improve both the GPL 
> and BSD licensed code. It is important in all cases, but becomes 
> critical when support from manufacturers is limited or even non existent.

Cooperation between OpenBSD developers and Linux developers, wwould be 
wonderful, but this appears to just be the latest of a number of 
disputes that have devolved into legalism and acrimony.

The time wasted fighting over this seems significantly larger than the 
effort need to solve it.

The respective BSD and GPL licensed code is open documentation for the 
programming of the typically closed device.
What is wrong with chosing to rewrite the drivers in contention ?
If the level of bile is sufficiently high it might make sense to do so 
using "clean room" techniques, where one developer uses the source 
licensed driver as the basis for writing documentation
and another developer uses the documentation as the basis for writing a 
new driver. The original author could/should still be credited.

It might even make sense to use projects like this as a means of 
recruiting new driver developers and building their skills - drafting 
prospective kernel developers from kernel-newbies, or asking for
volunteers on the appropriate lists.
Not having looked at the code for either the Linux or BSD atheros 
driver, but having some limited linux network driver experience, I would 
be happy to make an attempt at writing a clean Linux GPL
driver for atheros cards.

Another benefit to this approach is it might cool tempers. Neither the 
GPL not the BSD/ISC Licenses protect the information their authors have 
painstakingly extracted about the hardware.
If both sides recognize that copyright - particularly Open Source 
copyright licenses do not somehow make the ideas and information they 
express proprietary, and that all that is really
in contention is credit and a reduction in labor, then maybe it would be 
easier to get them to agree to modifying licenses.

Dave Lynch 					  	    DLA Systems
Software Development:  				         Embedded Linux
717.627.3770 	       dhlii@dlasys.net 	  http://www.dlasys.net
fax: 1.253.369.9244 			           Cell: 1.717.587.7774
Over 25 years' experience in platforms, languages, and technologies too numerous to list.

"Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction."
Albert Einstein

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox