Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: af_packet: don't enable global timestamps
From: Eric Dumazet @ 2007-09-04  7:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, Andi Kleen, netdev
In-Reply-To: <20070904063525.060e7a77@oldman>

On Tue, 4 Sep 2007 06:35:25 +0100
Stephen Hemminger <shemminger@linux-foundation.org> wrote:

> Andi mentioned he did something like this already, but never
> submitted it.
> 
> The dhcp client application uses AF_PACKET with a packet filter to
> receive data. The application doesn't even use timestamps, but because
> the AF_PACKET API has timestamps, they get turned on globally which
> causes an expensive time of day lookup for every packet received
> on any system that uses the standard DHCP client.
> 
> The fix is to not enable the timestamp (but use if if available).
> This causes the time lookup to only occur on those packets
> that are destined for the AF_PACKET socket. 
> The timestamping occurs after packet filtering
> so all packets dropped by filtering to not cause a clock call.
> 
> The one downside of this a a few microseconds additional delay
> added from the normal timestamping location (netif_rx) until the
> receive callback in AF_PACKET. But since the offset is fairly consistent
> it should not upset applications that do want really use timestamps,
> like wireshark.

This patch seems the correct fix for this longstanding problem.

Please note that if wireshark/tcpdump processes really want precise timestamps, 
they still can ask this by using setsockopt(SO_TIMESTAMP or SO_TIMESTAMPNS) on their private socket.

(We already know that running a sniffer has a cost anyway)

> 
> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
> 
> 
> --- a/net/packet/af_packet.c	2007-07-23 09:31:26.000000000 +0100
> +++ b/net/packet/af_packet.c	2007-09-03 14:55:00.000000000 +0100
> @@ -640,11 +640,10 @@ static int tpacket_rcv(struct sk_buff *s
>  	h->tp_snaplen = snaplen;
>  	h->tp_mac = macoff;
>  	h->tp_net = netoff;
> -	if (skb->tstamp.tv64 == 0) {
> -		__net_timestamp(skb);
> -		sock_enable_timestamp(sk);
> -	}
> -	tv = ktime_to_timeval(skb->tstamp);
> +	if (skb->tstamp.tv64)
> +		tv = ktime_to_timeval(skb->tstamp);
> +	else
> +		do_gettimeofday(&tv);
>  	h->tp_sec = tv.tv_sec;
>  	h->tp_usec = tv.tv_usec;
>  


^ permalink raw reply

* [PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
From: Micah Gruber @ 2007-09-04  8:06 UTC (permalink / raw)
  To: linux-kernel, netdev, jgarzik

This patch fixes a potential null dereference bug where we dereference dev before a null check. This patch simply moves the dereferencing after the null check.

Signed-off-by: Micah Gruber <micah.gruber@gmail.com>
---

--- a/drivers/net/tulip/uli526x.c
+++ b/drivers/net/tulip/uli526x.c
@@ -663,7 +663,7 @@
 {
 	struct net_device *dev = dev_id;
 	struct uli526x_board_info *db = netdev_priv(dev);
-	unsigned long ioaddr;
+	unsigned long ioaddr = dev->base_addr;
 	unsigned long flags;
 
 	if (!dev) {
@@ -671,8 +671,6 @@
 		return IRQ_NONE;
 	}
 
-	ioaddr = dev->base_addr;
-
 	spin_lock_irqsave(&db->lock, flags);
 	outl(0, ioaddr + DCR7);


^ permalink raw reply

* [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
From: Micah Gruber @ 2007-09-04  8:14 UTC (permalink / raw)
  To: linux-kernel, netdev, jgarzik

This patch fixes a potential null dereference bug where we dereference dev before a null check. This patch simply moves the dereferencing after the null check.

Signed-off-by: Micah Gruber <micah.gruber@gmail.com>
---

--- a/drivers/net/tulip/uli526x.c
+++ b/drivers/net/tulip/uli526x.c
@@ -663,7 +663,7 @@
 {
 	struct net_device *dev = dev_id;
 	struct uli526x_board_info *db = netdev_priv(dev);
-	unsigned long ioaddr = dev->base_addr;
+	unsigned long ioaddr;
 	unsigned long flags;
 
 	if (!dev) {
@@ -671,6 +671,8 @@
 		return IRQ_NONE;
 	}
 
+	ioaddr = dev->base_addr;
+
 	spin_lock_irqsave(&db->lock, flags);
 	outl(0, ioaddr + DCR7);


^ permalink raw reply

* Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
From: Steffen Klassert @ 2007-09-04  8:16 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Linux Kernel Mailing List, Linux Netdev Mailing List, Jeff Garzik,
	Mark Hindley
In-Reply-To: <alpine.LFD.0.999.0709040340250.8504@enigma.security.iitk.ac.in>

On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:
> 
> drivers/net/3c59x.c: In function 'vortex_up':
> drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this function

This came in with the recently applied 3c59x-check-return-of-pci_enable_device patch
from Mark Hindley. I just compiled it on a PCI only machine so far, therefore I did
not notice the warning yet.

> 
> is a genuine bug. The function returns an uninitialized value of 'err'
> back to the caller, which expects it to be 0 for success cases. Let's
> fix this by explicitly initializing 'err' to zero.
> 
> Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>

^ permalink raw reply

* Re: [PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
From: Satyam Sharma @ 2007-09-04  8:36 UTC (permalink / raw)
  To: Micah Gruber
  Cc: Linux Kernel Mailing List, Linux Netdev Mailing List, jgarzik
In-Reply-To: <46DD121A.2010404@gmail.com>

Hi Micah,

On Tue, 4 Sep 2007, Micah Gruber wrote:

> This patch fixes a potential null dereference bug where we dereference
> dev before a null check. This patch simply moves the dereferencing after
> the null check.
> 
> Signed-off-by: Micah Gruber <micah.gruber@gmail.com>
> ---
> 
> --- a/drivers/net/tulip/uli526x.c
> +++ b/drivers/net/tulip/uli526x.c
> @@ -663,7 +663,7 @@
>  {
>  	struct net_device *dev = dev_id;
>  	struct uli526x_board_info *db = netdev_priv(dev);
> -	unsigned long ioaddr;
> +	unsigned long ioaddr = dev->base_addr;
>  	unsigned long flags;
>  
>  	if (!dev) {
> @@ -671,8 +671,6 @@
>  		return IRQ_NONE;
>  	}
>  
> -	ioaddr = dev->base_addr;
> -
>  	spin_lock_irqsave(&db->lock, flags);
>  	outl(0, ioaddr + DCR7);

[The patch is reversed.]

But it is also complete -- you've left out the "netdev_priv(dev)"
statement _just_ before the dereference that you've pushed down.
Coverity wasn't smart enough to tell, but that's actually a dev->priv,
so we'll _still_ be dereferencing dev before checking it for NULL below.
And lastly, the patch isn't quite correct. It is the (!dev) check that
is redundant and must be removed instead :-)

I bet more such pointless checks exist all over. It makes more sense to
remove them than unnecessarily try to solve "potential" NULL dereferences,
that we (naturally) never saw earlier, because they're impossible. ISTR
pointing out two similar patches a month or so back when Jeff pushed net
driver patches upstream, IIRC ... ]

Satyam

^ permalink raw reply

* Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
From: Satyam Sharma @ 2007-09-04  8:39 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Linux Kernel Mailing List, Linux Netdev Mailing List, Jeff Garzik,
	Mark Hindley
In-Reply-To: <20070904081602.GB4241@newton.mathematik.tu-chemnitz.de>

Hi Steffen,


On Tue, 4 Sep 2007, Steffen Klassert wrote:

> On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:
> > 
> > drivers/net/3c59x.c: In function 'vortex_up':
> > drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this function
> 
> This came in with the recently applied 3c59x-check-return-of-pci_enable_device patch
> from Mark Hindley. I just compiled it on a PCI only machine so far, therefore I did
> not notice the warning yet.

Hmm, the .config I built with had PCI=y as well. Probably a compiler
version difference -- Jeff also mentioned yesterday that some newer
GCC versions fail to warn about uninitialized variables cases.


> > is a genuine bug. The function returns an uninitialized value of 'err'
> > back to the caller, which expects it to be 0 for success cases. Let's
> > fix this by explicitly initializing 'err' to zero.
> > 
> > Signed-off-by: Satyam Sharma <satyam@infradead.org>
> Acked-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>


Thanks,

Satyam

^ permalink raw reply

* Re: [PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c
From: Satyam Sharma @ 2007-09-04  8:50 UTC (permalink / raw)
  To: Micah Gruber
  Cc: Linux Kernel Mailing List, Linux Netdev Mailing List, jgarzik
In-Reply-To: <alpine.LFD.0.999.0709041352450.13651@enigma.security.iitk.ac.in>



On Tue, 4 Sep 2007, Satyam Sharma wrote:

> Hi Micah,
> 
> 
> On Tue, 4 Sep 2007, Micah Gruber wrote:
> 
> > This patch fixes a potential null dereference bug where we dereference
> > dev before a null check. This patch simply moves the dereferencing after
> > the null check.
> > 
> > Signed-off-by: Micah Gruber <micah.gruber@gmail.com>
> > ---
> > 
> > --- a/drivers/net/tulip/uli526x.c
> > +++ b/drivers/net/tulip/uli526x.c
> > @@ -663,7 +663,7 @@
> >  {
> >  	struct net_device *dev = dev_id;
> >  	struct uli526x_board_info *db = netdev_priv(dev);
> > -	unsigned long ioaddr;
> > +	unsigned long ioaddr = dev->base_addr;
> >  	unsigned long flags;
> >  
> >  	if (!dev) {
> > @@ -671,8 +671,6 @@
> >  		return IRQ_NONE;
> >  	}
> >  
> > -	ioaddr = dev->base_addr;
> > -
> >  	spin_lock_irqsave(&db->lock, flags);
> >  	outl(0, ioaddr + DCR7);
> 
> 
> [The patch is reversed.]
> 
> But it is also complete -- you've left out the "netdev_priv(dev)"
                 ^^^^^^^^
I meant:       incomplete

> statement _just_ before the dereference that you've pushed down.
> Coverity wasn't smart enough to tell, but that's actually a dev->priv,
> so we'll _still_ be dereferencing dev before checking it for NULL below.
> And lastly, the patch isn't quite correct. It is the (!dev) check that
> is redundant and must be removed instead :-)
> 
> I bet more such pointless checks exist all over. It makes more sense to
> remove them than unnecessarily try to solve "potential" NULL dereferences,
> that we (naturally) never saw earlier, because they're impossible. ISTR
> pointing out two similar patches a month or so back when Jeff pushed net
> driver patches upstream, IIRC ... ]
> 
> 
> Satyam

^ permalink raw reply

* Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
From: Mark Hindley @ 2007-09-04  8:53 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Steffen Klassert, Linux Kernel Mailing List,
	Linux Netdev Mailing List, Jeff Garzik
In-Reply-To: <20070904081602.GB4241@newton.mathematik.tu-chemnitz.de>

On Tue, Sep 04, 2007 at 02:09:47PM +0530, Satyam Sharma wrote:
> Hi Steffen,
> 
> 
> On Tue, 4 Sep 2007, Steffen Klassert wrote:
> 
> > On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:
> > > 
> > > drivers/net/3c59x.c: In function 'vortex_up':
> > > drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this function
> > 
> > This came in with the recently applied 3c59x-check-return-of-pci_enable_device patch
> > from Mark Hindley. I just compiled it on a PCI only machine so far, therefore I did
> > not notice the warning yet.
> 
> Hmm, the .config I built with had PCI=y as well. Probably a compiler
> version difference -- Jeff also mentioned yesterday that some newer
> GCC versions fail to warn about uninitialized variables cases.
> 

Sorry, this is my bad. I have just checked: there is no warning with gcc
4.2 or 4.1, but 3.3 emits the warning. 

> 
> > > is a genuine bug. The function returns an uninitialized value of 'err'
> > > back to the caller, which expects it to be 0 for success cases. Let's
> > > fix this by explicitly initializing 'err' to zero.
> > > 
> > > Signed-off-by: Satyam Sharma <satyam@infradead.org>
> > Acked-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>

Acked-by: Mark Hindley <mark@hindley.org.uk>

^ permalink raw reply

* Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
From: Steffen Klassert @ 2007-09-04  9:17 UTC (permalink / raw)
  To: Mark Hindley
  Cc: Satyam Sharma, Linux Kernel Mailing List,
	Linux Netdev Mailing List, Jeff Garzik
In-Reply-To: <20070904085331.GH5944@hindley.org.uk>

On Tue, Sep 04, 2007 at 09:53:31AM +0100, Mark Hindley wrote:
> On Tue, Sep 04, 2007 at 02:09:47PM +0530, Satyam Sharma wrote:
> > Hi Steffen,
> > 
> > 
> > On Tue, 4 Sep 2007, Steffen Klassert wrote:
> > 
> > > On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:
> > > > 
> > > > drivers/net/3c59x.c: In function 'vortex_up':
> > > > drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this function
> > > 
> > > This came in with the recently applied 3c59x-check-return-of-pci_enable_device patch
> > > from Mark Hindley. I just compiled it on a PCI only machine so far, therefore I did
> > > not notice the warning yet.
> > 
> > Hmm, the .config I built with had PCI=y as well. Probably a compiler
> > version difference -- Jeff also mentioned yesterday that some newer
> > GCC versions fail to warn about uninitialized variables cases.
> > 
> 
> Sorry, this is my bad. I have just checked: there is no warning with gcc
> 4.2 or 4.1, but 3.3 emits the warning. 
> 

The only warning that I was able to trigger with gcc 4.2 is in the case of a .config
without PCI support. In this case I get

drivers/net/3c59x.c: In function 'vortex_up':
drivers/net/3c59x.c:1672: warning: 'err' is used uninitialized in this function
                                         ^^
So we have to be more carefull with the newer gcc versions.

Steffen


^ permalink raw reply

* Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
From: Mark Hindley @ 2007-09-04  9:35 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Satyam Sharma, Linux Kernel Mailing List,
	Linux Netdev Mailing List, Jeff Garzik
In-Reply-To: <20070904085331.GH5944@hindley.org.uk>

On Tue, Sep 04, 2007 at 11:17:57AM +0200, Steffen Klassert wrote:
 
> The only warning that I was able to trigger with gcc 4.2 is in the case of a .config
> without PCI support. In this case I get
> 
> drivers/net/3c59x.c: In function 'vortex_up':
> drivers/net/3c59x.c:1672: warning: 'err' is used uninitialized in this function

That should be fixed by Satyam's patch too.

Mark

^ permalink raw reply

* Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug
From: Steffen Klassert @ 2007-09-04  9:39 UTC (permalink / raw)
  To: Mark Hindley
  Cc: Satyam Sharma, Linux Kernel Mailing List,
	Linux Netdev Mailing List, Jeff Garzik
In-Reply-To: <20070904093510.GL5944@hindley.org.uk>

On Tue, Sep 04, 2007 at 10:35:10AM +0100, Mark Hindley wrote:
> On Tue, Sep 04, 2007 at 11:17:57AM +0200, Steffen Klassert wrote:
>  
> > The only warning that I was able to trigger with gcc 4.2 is in the case of a .config
> > without PCI support. In this case I get
> > 
> > drivers/net/3c59x.c: In function 'vortex_up':
> > drivers/net/3c59x.c:1672: warning: 'err' is used uninitialized in this function
> 
> That should be fixed by Satyam's patch too.

Yes, of course.
This was just to point out what's  possible to trigger with a newer gcc and what's not.

Steffen

^ permalink raw reply

* Re: some weird corruption in net-2.6.24
From: Thomas Graf @ 2007-09-04 11:24 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, netdev
In-Reply-To: <E1ISKzA-0007xM-00@gondolin.me.apana.org.au>

* Herbert Xu <herbert@gondor.apana.org.au> 2007-09-04 07:05
> Thomas Graf <tgraf@suug.ch> wrote:
> > 
> > I've been trying to reproduce this, what happens on my system
> > is that when the ISAKMP SA lifetime is exceeded the rekeying
> > fails and my connection dies. I can reproduce this back to
> > 2.6.22 and it doesn't seem related to my recent xfrm_user work.
> > It looks like this behaviour is hiding the bug you are seeing.
> 
> Could you try extending the ISAKMP SA life time so that it is
> longer than the IPSec SA life time?

Yes, in this case the IPSec SA rekeying works just fine.
I can't spot any signs of corruptions or alike.

^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: jamal @ 2007-09-04 13:03 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Daniele Venzano, davem, rick.jones2, msb, netdev, grundler,
	robert.olsson, jeff, nhorman
In-Reply-To: <20070904032036.GA11153@ludhiana>

On Mon, 2007-03-09 at 20:20 -0700, Mandeep Singh Baines wrote:

> I didn't see much saving in interrupts on my machine (too fast, I guess). 

You could try the idea suggested by Dave earlier and just turn interupts
for every nth packet. That should cut down the numbers.

> I did see a significant boost to tx performance by optimizing start_xmit: more
> than double pps in pktgen.

148Kpps on a slow piece of hardware aint bad - Good Stuff. I wonder how
much CPU is being abused.

If you wanna go one extra mile (separate future patch): get rid of that
tx lock and use netif_tx_lock on the interupt path. Look at some sane
driver like tg3 for reference.

cheers,
jamal

^ permalink raw reply

* [NET] sgiseeq: replace use of dma_cache_wback_inv
From: Ralf Baechle @ 2007-09-04 13:41 UTC (permalink / raw)
  To: Andrew Morton, Jeff Garzik, netdev

The sgiseeq driver is one of the few remaining users of the ancient
cache banging DMA API.  Replaced with the modern days DMA API.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

diff --git a/drivers/net/sgiseeq.c b/drivers/net/sgiseeq.c
index 0fb74cb..eb67b02 100644
--- a/drivers/net/sgiseeq.c
+++ b/drivers/net/sgiseeq.c
@@ -75,6 +75,7 @@ struct sgiseeq_init_block { /* Note the name ;-) */
 
 struct sgiseeq_private {
 	struct sgiseeq_init_block *srings;
+	dma_addr_t srings_dma;
 
 	/* Ptrs to the descriptors in uncached space. */
 	struct sgiseeq_rx_desc *rx_desc;
@@ -643,13 +644,20 @@ static int __init sgiseeq_probe(struct platform_device *pdev)
 	sp = netdev_priv(dev);
 
 	/* Make private data page aligned */
-	sr = (struct sgiseeq_init_block *) get_zeroed_page(GFP_KERNEL);
+	sr = dma_alloc_coherent(&pdev->dev, sizeof(*sp->srings),
+				&sp->srings_dma, GFP_KERNEL);
 	if (!sr) {
 		printk(KERN_ERR "Sgiseeq: Page alloc failed, aborting.\n");
 		err = -ENOMEM;
 		goto err_out_free_dev;
 	}
 	sp->srings = sr;
+	sp->rx_desc = sp->srings->rxvector;
+	sp->tx_desc = sp->srings->txvector;
+
+	/* A couple calculations now, saves many cycles later. */
+	setup_rx_ring(sp->rx_desc, SEEQ_RX_BUFFERS);
+	setup_tx_ring(sp->tx_desc, SEEQ_TX_BUFFERS);
 
 	memcpy(dev->dev_addr, pd->mac, ETH_ALEN);
 
@@ -662,19 +670,6 @@ static int __init sgiseeq_probe(struct platform_device *pdev)
 	sp->name = sgiseeqstr;
 	sp->mode = SEEQ_RCMD_RBCAST;
 
-	sp->rx_desc = (struct sgiseeq_rx_desc *)
-	              CKSEG1ADDR(ALIGNED(&sp->srings->rxvector[0]));
-	dma_cache_wback_inv((unsigned long)&sp->srings->rxvector,
-	                    sizeof(sp->srings->rxvector));
-	sp->tx_desc = (struct sgiseeq_tx_desc *)
-	              CKSEG1ADDR(ALIGNED(&sp->srings->txvector[0]));
-	dma_cache_wback_inv((unsigned long)&sp->srings->txvector,
-	                    sizeof(sp->srings->txvector));
-
-	/* A couple calculations now, saves many cycles later. */
-	setup_rx_ring(sp->rx_desc, SEEQ_RX_BUFFERS);
-	setup_tx_ring(sp->tx_desc, SEEQ_TX_BUFFERS);
-
 	/* Setup PIO and DMA transfer timing */
 	sp->hregs->pconfig = 0x161;
 	sp->hregs->dconfig = HPC3_EDCFG_FIRQ | HPC3_EDCFG_FEOP |
@@ -732,7 +727,8 @@ static int __exit sgiseeq_remove(struct platform_device *pdev)
 	struct sgiseeq_private *sp = netdev_priv(dev);
 
 	unregister_netdev(dev);
-	free_page((unsigned long) sp->srings);
+	dma_free_coherent(&pdev->dev, sizeof(*sp->srings), sp->srings,
+	                  sp->srings_dma);
 	free_netdev(dev);
 	platform_set_drvdata(pdev, NULL);
 

^ permalink raw reply related

* RE: Ethernet weirdness on 82xx
From: Rune Torgersen @ 2007-09-04 14:06 UTC (permalink / raw)
  To: Denys, linuxppc-embedded, netdev
In-Reply-To: <20070902110957.M44221@visp.net.lb>

> -----Original Message-----
> From: Denys [mailto:denys@visp.net.lb] 
> Sent: Sunday, September 02, 2007 6:12 AM
> To: Rune Torgersen; linuxppc-embedded@ozlabs.org; 
> netdev@vger.kernel.org
> Subject: Re: Ethernet weirdness on 82xx
> 
> Thats normal.
> Check arp_filter sysctl :
> arp_filter - BOOLEAN

Thank you, that was it.

Did the following, and now I don't get the double resond anymore

#only respond to arps on the correct port
echo 1 > /proc/sys/net/ipv4/conf/all/arp_announce
echo 2 > /proc/sys/net/ipv4/conf/all/arp_ignore

^ permalink raw reply

* Re: [PATCH] Massive net driver stats cleanup
From: Jeremy Fitzhardinge @ 2007-09-04 15:09 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, LKML, Andrew Morton
In-Reply-To: <20070901162505.GA2944@havoc.gtf.org>

Jeff Garzik wrote:
>  drivers/net/xen-netfront.c |   26 +++--------
>   

Hey, look, its identical to the patch I have here.

Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>

    J

^ permalink raw reply

* Re: [PATCH RFC] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts with the host stack.
From: Steve Wise @ 2007-09-04 15:13 UTC (permalink / raw)
  To: Roland Dreier; +Cc: general, ewg, netdev, linux-kernel, sean.hefty
In-Reply-To: <adaveawwcpz.fsf@cisco.com>

Roland Dreier wrote:
>  > The sysadmin creates "for iwarp use only" alias interfaces of the form
>  > "devname:iw*" where devname is the native interface name (eg eth0) for the
>  > iwarp netdev device.  The alias label can be anything starting with "iw".
>  > The "iw" immediately after the ':' is the key used by the iwarp driver.
> 
> What's wrong with my suggestion of having the iwarp driver create an
> "iwX" interface to go with the normal "ethX" interface?  It seems
> simpler to me, and there's a somewhat similar precedent with how
> mac80211 devices create both wlan0 and wmaster0 interfaces.
> 
>  - R.

It seemed much more painful for me to implement. :-)

I'll look into this, but I think for this to be done, the changes must 
be in the cxgb3 driver, not the rdma driver, because the guts of the 
netdev struct are all private to cxgb3.  Remember that this interface 
needs to still do non TCP traffic (like ARP and UDP)...

Maybe you have something in mind here that I'm not thinking about?

Steve.

^ permalink raw reply

* Re: 82557/8/9 Ethernet Pro 100 interrupt mitigation support
From: Kok, Auke @ 2007-09-04 15:56 UTC (permalink / raw)
  To: Marc Sigler; +Cc: netdev, linux-net
In-Reply-To: <46DBE2E7.7070906@mediatvcom.com>

Marc Sigler wrote:
> Hello everyone,
> 
> I have several systems with three integrated Intel 82559 (I *think*).
> 
> Does someone know if these boards support hardware interrupt mitigation? 
> I.e. is it possible to configure them to raise an IRQ only if their 
> hardware buffer is full OR if some given time (say 1 ms) has passed and 
> packets are available in their hardware buffer.
> 
> I've been using the eepro100 driver up to now, but I'm about to try the 
> e100 driver. Would I have to use NAPI? Or is this an orthogonal feature?

the software developers manual for this part is available on our e1000.sf.net 
page here:
 
http://downloads.sourceforge.net/e1000/OpenSDM_55x_10c.pdf?modtime=1040083200&big_mirror=0

e100 hardware (as far as I can see from the specs) doesn't support any irq 
mitigation, so you'll need to run in NAPI mode if you want to throttle irq's. 
the in-kernel e100 already runs in NAPI mode, so that's already covered.

beware that the eepro100 driver is scheduled for removal (2.6.25 or so).

Cheers,

Auke

^ permalink raw reply

* RE: 82557/8/9 Ethernet Pro 100 interrupt mitigation support
From: Brandeburg, Jesse @ 2007-09-04 16:12 UTC (permalink / raw)
  To: Kok, Auke-jan H, Marc Sigler; +Cc: netdev, linux-net
In-Reply-To: <46DD8014.7000408@intel.com>

Kok, Auke wrote:
> Marc Sigler wrote:
>> Hello everyone,
>> 
>> I have several systems with three integrated Intel 82559 (I *think*).
>> 
>> Does someone know if these boards support hardware interrupt
>> mitigation? I.e. is it possible to configure them to raise an IRQ
>> only if their hardware buffer is full OR if some given time (say 1
>> ms) has passed and packets are available in their hardware buffer.
>> 
>> I've been using the eepro100 driver up to now, but I'm about to try
>> the e100 driver. Would I have to use NAPI? Or is this an orthogonal
>> feature? 
> 
> the software developers manual for this part is available on our
> e1000.sf.net page here:
> 
>
http://downloads.sourceforge.net/e1000/OpenSDM_55x_10c.pdf?modtime=10400
83200&big_mirror=0
> 
> e100 hardware (as far as I can see from the specs) doesn't support
> any irq mitigation, so you'll need to run in NAPI mode if you want to
> throttle irq's. the in-kernel e100 already runs in NAPI mode, so
> that's already covered. 
> 
> beware that the eepro100 driver is scheduled for removal (2.6.25 or
> so). 

We support mitigation of interrupts in a downloadable microcode on only
a few pieces of hardware (revision id specific) in e100.c (see
e100_setup_ucode)

If you really really wanted mitigation you could probably backport the
microcode from the e100 driver in the 2.4.35 kernel for your specific
hardware.  This driver is versioned 2.X.

Beyond that I can't really offer much help in this case, but would
certainly accept patches!

Jesse

^ permalink raw reply

* Re: [PATCH 1/2]: [NET_SCHED]: Make all rate based scheduler work with TSO.
From: Patrick McHardy @ 2007-09-04 16:23 UTC (permalink / raw)
  To: Bill Fink
  Cc: Jesper Dangaard Brouer, jdb, netdev@vger.kernel.org,
	David S. Miller
In-Reply-To: <20070903233456.0e738183.billfink@mindspring.com>

Bill Fink wrote:
> On Sat, 1 Sep 2007, Jesper Dangaard Brouer wrote:
>
>>On Sat, 1 Sep 2007, Patrick McHardy wrote:
>>
>>>
>>>It still won't work properly with TSO (TBF for example already drops
>>>oversized packets during ->enqueue), but its a good cleanup anyway.
>>
>>Then lets call it a cleanup of the L2T macros.  In the next step we will 
>>fix the different schedulers, to use the ability to lookup larger sized 
>>packets. (I did notice the TBF scheduler would drop oversized packets).
> 
> 
> Hmmm.  I guess this is also why TBF doesn't seem to work with 9000 byte
> jumbo frames.
> 
> [root@lang4 ~]# tc qdisc add dev eth2 root tbf rate 2gbit buffer 5000000 limit 18000


Yes, you need to specify the MTU on the command line for
jumbo frames.

^ permalink raw reply

* Re: [PATCH 2/2]: [NET_SCHED]: Making rate table lookups more flexible.
From: Patrick McHardy @ 2007-09-04 16:25 UTC (permalink / raw)
  To: jdb; +Cc: Jesper Dangaard Brouer, netdev@vger.kernel.org
In-Reply-To: <1188829151.16405.10.camel@localhost.localdomain>

Jesper Dangaard Brouer wrote:
> On Sun, 2007-09-02 at 23:16 +0200, Patrick McHardy wrote:
> 
>>Jesper Dangaard Brouer wrote:
>>
>>>On Sun, 2 Sep 2007, Patrick McHardy wrote:
>>>
>>>Lets focus on the general case, where the functionality actually is 
>>>needed right away.
>>>
>>>In the general case:
>>>
>>>- The rate table needs to be aligned (cell_align=-1).
>>>  (currently, we miscalculates up to 7 bytes on every lookup)
>>
>>We will always do that, thats a consequence of storing the
>>transmission times for multiples of 8b.
> 
> 
> The issue is that we use the lower boundary for calculating the transmit
> cost. Thus, a 15 bytes packet only have a transmit cost of 8 bytes.

I believe this is something that should be fixed anyway,
its better to overestimate than underestimate to stay
in control of the queue. We could additionally make the
rate tables more finegrained (optionally).

>>>- The existing tc overhead calc can be made more accurate.
>>>  (by adding overhead before doing the lookup, instead of the
>>>   current solution where the rate table is modified with its
>>>   limited resolution)
>>
>>Please demonstrate this with patches (one for the overhead
>>calculation, one for the cell_align thing), then we can
>>continue this discussion.
> 
> 
> I have attached a patch for the overhead calculation.

Thanks, I probably won't get to looking into this until
after the netfilter workshop next week.

^ permalink raw reply

* Re: 82557/8/9 Ethernet Pro 100 interrupt mitigation support
From: John Sigler @ 2007-09-04 16:41 UTC (permalink / raw)
  To: jesse.brandeburg; +Cc: auke-jan.h.kok, netdev, linux-net
In-Reply-To: <36D9DB17C6DE9E40B059440DB8D95F520344439B@orsmsx418.amr.corp.intel.com>

Jesse Brandeburg wrote:

> Auke Kok wrote:
> 
>> Marc Sigler wrote:
>> 
>>> I have several systems with three integrated Intel 82559 (I *think*).
>>> 
>>> Does someone know if these boards support hardware interrupt
>>> mitigation? I.e. is it possible to configure them to raise an IRQ
>>> only if their hardware buffer is full OR if some given time (say 1
>>> ms) has passed and packets are available in their hardware buffer.
>>> 
>>> I've been using the eepro100 driver up to now, but I'm about to try
>>> the e100 driver. Would I have to use NAPI? Or is this an orthogonal
>>> feature? 
>> 
>> e100 hardware (as far as I can see from the specs) doesn't support
>> any irq mitigation, so you'll need to run in NAPI mode if you want to
>> throttle irq's. the in-kernel e100 already runs in NAPI mode, so
>> that's already covered. 
>> 
>> beware that the eepro100 driver is scheduled for removal (2.6.25 or so).
> 
> We support mitigation of interrupts in a downloadable microcode on only
> a few pieces of hardware (revision id specific) in e100.c (see
> e100_setup_ucode)

http://lxr.linux.no/source/drivers/net/e100.c#L1176

OK.

How do I tell which revision id I have?

00:08.0 0200: 8086:1229 (rev 08)
00:09.0 0200: 8086:1229 (rev 08)
00:0a.0 0200: 8086:1229 (rev 08)

How much memory is available on the board to bundle packets? 3000 bytes?

> If you really really wanted mitigation you could probably backport the
> microcode from the e100 driver in the 2.4.35 kernel for your specific
> hardware.  This driver is versioned 2.X.

I forgot to mention I'm running 2.6.22.1-rt9.
I'm not sure why you mention 2.4.35?
The problem with e100 is that it fails to properly set up all three 
interfaces, which is why I'm stuck with eepro100.

Regards.

^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: Daniele Venzano @ 2007-09-04 16:56 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: hadi, Mandeep Baines, davem, rick.jones2, msb, netdev, grundler,
	robert.olsson, jeff, nhorman
In-Reply-To: <20070904032036.GA11153@ludhiana>

----- Message d'origine -----
De: Mandeep Singh Baines <mandeep.baines@gmail.com>
Date: Mon, 3 Sep 2007 20:20:36 -0700
Sujet: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition

>Hi Daniele,
>
>Attached is a patch for converting the sis900 driver to NAPI. Please take a
>look at let me know what you think. I'm not 100% sure if I'm handling the
>rotting packet issue correctly or that I have the locking right in tx_timeout.
>These might be areas to look at closely.
>
>I didn't see much saving in interrupts on my machine (too fast, I guess). But
>I still think its beneficial: pushing work out of the interrupt handler into
>a bottom half is a good thing and we no longer need to disable interrupts 
>in start_xmit. 
>
>I did see a significant boost to tx performance by optimizing start_xmit: more
>than double pps in pktgen.
>
>I'm also attaching some test results for various iterations of development.

The patch looks good and I think it can be pushed higher (-mm ?) for some wider 
testing. I don't have the hardware available to do some tests myself, 
unfortunately, but it would be similar to yours anyway.

I'd like to know how this works for people with less memory and slower CPU, but any 
kind of test run will be appreciated.

Thanks, bye.


--
Daniele Venzano
venza@brownhat.org

^ permalink raw reply

* Re: [PATCH] Fix e100 on systems that have cache incoherent DMA
From: Kok, Auke @ 2007-09-04 17:02 UTC (permalink / raw)
  To: David Acker
  Cc: e1000-devel, netdev, Jesse Brandeburg, Milton Miller,
	Scott Feldman, John Ronciak, Jeff Kirsher, Jeff Garzik
In-Reply-To: <20070831205430.7209E46C20E@localhost>

David Acker wrote:
> On the systems that have cache incoherent DMA, including ARM, there is a
> race condition between software allocating a new receive buffer and hardware
> writing into a buffer.  The two race on touching the last Receive Frame
> Descriptor (RFD).  It has its el-bit set and its next link equal to 0.
> When hardware encounters this buffer it attempts to write data to it and
> then update Status Word bits and Actual Count in the RFD.  At the same time
> software may try to clear the el-bit and set the link address to a new buffer.
> 
> Since the entire RFD is once cache-line, the two write operations can collide.
> This can lead to the receive unit stalling or interpreting random memory as
> its receive area.
> 
> The fix is to set the el-bit on and the size to 0 on the next to last buffer
> in the chain.  When the hardware encounters this buffer it stops and does not
> write to it at all.  The hardware issues an RNR interrupt with the receive
> unit in the No Resources state.  Software can write to the tail of the list
> because it knows hardware will stop on the previous descriptor that was
> marked as the end of list.
> 
> Once it has a new next to last buffer prepared, it can clear the el-bit and
> set the size on the previous one.  The race on this buffer is safe since
> the link already points to a valid next buffer and the software can handle
> the race setting the size (assuming aligned 16 bit writes are atomic with
> respect to the DMA read). If the hardware sees the el-bit cleared without
> the size set, it will move on to the next buffer and skip this one.  If it
> sees the size set but the el-bit still set, it will complete that buffer
> and then RNR interrupt and wait.
> 
> Flags are kept in the software descriptor to note if the el bit is set and if
> the size was 0.  When software clears the RFD's el bit and set its size, it
> also clears the el flag but leaves the size was 0 bit set.  This way software
> can identify them when the race may have occurred when cleaning the ring.
> On these descriptors, it looks ahead and if the next one is complete then
> hardware must have skipped the current one.  Logic is added to prevent two
> packets in a row being marked while the receiver is running to avoid running
> in lockstep with the hardware and thereby limiting the required lookahead.
> 
> This is a patch for 2.6.23-rc4.
> 
> Signed-off-by: David Acker <dacker@roinet.com>


thanks David,

I'm going to try to give this some decent testing on x86 in the next two weeks, 
I'll let everyone know how this is going and take a look at the patch a bit 
later in -depth.

Auke


> 
> ---
> 
> --- linux-2.6.23-rc4/drivers/net/e100.c.orig	2007-08-30 13:32:10.000000000 -0400
> +++ linux-2.6.23-rc4/drivers/net/e100.c	2007-08-30 15:42:07.000000000 -0400
> @@ -106,6 +106,13 @@
>   *	the RFD, the RFD must be dma_sync'ed to maintain a consistent
>   *	view from software and hardware.
>   *
> + *	In order to keep updates to the RFD link field from colliding with
> + *	hardware writes to mark packets complete, we use the feature that
> + *	hardware will not write to a size 0 descriptor and mark the previous
> + *	packet as end-of-list (EL).   After updating the link, we remove EL
> + *	and only then restore the size such that hardware may use the
> + *	previous-to-end RFD. 
> + *
>   *	Under typical operation, the  receive unit (RU) is start once,
>   *	and the controller happily fills RFDs as frames arrive.  If
>   *	replacement RFDs cannot be allocated, or the RU goes non-active,
> @@ -281,14 +288,14 @@ struct csr {
>  };
>  
>  enum scb_status {
> +	rus_no_res       = 0x08,
>  	rus_ready        = 0x10,
>  	rus_mask         = 0x3C,
>  };
>  
>  enum ru_state  {
> -	RU_SUSPENDED = 0,
> -	RU_RUNNING	 = 1,
> -	RU_UNINITIALIZED = -1,
> +	ru_stopped = 0,
> +	ru_running = 1,
>  };
>  
>  enum scb_stat_ack {
> @@ -401,10 +408,16 @@ struct rfd {
>  	u16 size;
>  };
>  
> +enum rx_flags {
> +	rx_el = 0x01,
> +	rx_s0 = 0x02,
> +};
> +
>  struct rx {
>  	struct rx *next, *prev;
>  	struct sk_buff *skb;
>  	dma_addr_t dma_addr;
> +	u8 flags;
>  };
>  
>  #if defined(__BIG_ENDIAN_BITFIELD)
> @@ -952,7 +965,7 @@ static void e100_get_defaults(struct nic
>  		((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));
>  
>  	/* Template for a freshly allocated RFD */
> -	nic->blank_rfd.command = cpu_to_le16(cb_el);
> +	nic->blank_rfd.command = 0;
>  	nic->blank_rfd.rbd = 0xFFFFFFFF;
>  	nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
>  
> @@ -1753,18 +1766,48 @@ static int e100_alloc_cbs(struct nic *ni
>  	return 0;
>  }
>  
> -static inline void e100_start_receiver(struct nic *nic, struct rx *rx)
> +static void e100_find_mark_el(struct nic *nic, struct rx *marked_rx, int is_running)
>  {
> -	if(!nic->rxs) return;
> -	if(RU_SUSPENDED != nic->ru_running) return;
> +	struct rx *rx = nic->rx_to_use->prev->prev;
> +	struct rfd *rfd;
> +
> +	if (marked_rx == rx)
> +		return;
> +
> +	rfd = (struct rfd *) rx->skb->data;
> +	rfd->command |= cpu_to_le16(cb_el);
> +	rfd->size = 0;
> +	pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
> +		sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
> +	rx->flags |= (rx_el | rx_s0);
> +
> +	if (!marked_rx)
> +		return;
> +
> +	rfd = (struct rfd *) marked_rx->skb->data;
> +	rfd->command &= ~cpu_to_le16(cb_el);
> +	pci_dma_sync_single_for_device(nic->pdev, marked_rx->dma_addr,
> +		sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
> +
> +	rfd->size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
> +	pci_dma_sync_single_for_device(nic->pdev, marked_rx->dma_addr,
> +		sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
>  
> -	/* handle init time starts */
> -	if(!rx) rx = nic->rxs;
> +	if (is_running)
> +		marked_rx->flags &= ~rx_el;
> +	else
> +		marked_rx->flags &= ~(rx_el | rx_s0);
> +}
> +
> +static inline void e100_start_receiver(struct nic *nic)
> +{
> +	if(!nic->rxs) return;
> +	if (ru_stopped != nic->ru_running) return;
>  
>  	/* (Re)start RU if suspended or idle and RFA is non-NULL */
> -	if(rx->skb) {
> -		e100_exec_cmd(nic, ruc_start, rx->dma_addr);
> -		nic->ru_running = RU_RUNNING;
> +	if (nic->rx_to_clean->skb) {
> +		e100_exec_cmd(nic, ruc_start, nic->rx_to_clean->dma_addr);
> +		nic->ru_running = ru_running;
>  	}
>  }
>  
> @@ -1793,8 +1836,6 @@ static int e100_rx_alloc_skb(struct nic 
>  		struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
>  		put_unaligned(cpu_to_le32(rx->dma_addr),
>  			(u32 *)&prev_rfd->link);
> -		wmb();
> -		prev_rfd->command &= ~cpu_to_le16(cb_el);
>  		pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
>  			sizeof(struct rfd), PCI_DMA_TODEVICE);
>  	}
> @@ -1808,6 +1849,7 @@ static int e100_rx_indicate(struct nic *
>  	struct sk_buff *skb = rx->skb;
>  	struct rfd *rfd = (struct rfd *)skb->data;
>  	u16 rfd_status, actual_size;
> +	u8 status;
>  
>  	if(unlikely(work_done && *work_done >= work_to_do))
>  		return -EAGAIN;
> @@ -1819,9 +1861,47 @@ static int e100_rx_indicate(struct nic *
>  
>  	DPRINTK(RX_STATUS, DEBUG, "status=0x%04X\n", rfd_status);
>  
> -	/* If data isn't ready, nothing to indicate */
> -	if(unlikely(!(rfd_status & cb_complete)))
> +	/* 
> +	 * If data isn't ready, nothing to indicate
> +	 * If both the el and s0 rx flags are set, we have hit the marked
> +	 * buffer but we don't know if hardware has seen it so we check
> +	 * the status.
> +	 * If only the s0 flag is set, we check the next buffer.
> +	 * If it is complete, we know that hardware saw the rfd el bit
> +	 * get cleared but did not see the rfd size get set so it
> +	 * skipped this buffer.  We just return 0 and look at the
> +	 * next buffer.
> +	 * If only the s0 flag is set but the next buffer is
> +	 * not complete, we cleared the el flag as hardware
> +	 * hit this buffer.
> +	 */
> +	if (unlikely(!(rfd_status & cb_complete))) {
> +		u8 maskedFlags = rx->flags & (rx_el | rx_s0);
> +		if (maskedFlags == (rx_el | rx_s0)) {
> +			status = readb(&nic->csr->scb.status);
> +			if (status & rus_no_res)
> +				nic->ru_running = ru_stopped;
> +		} else if (maskedFlags == rx_s0) {
> +			struct rx *next_rx = rx->next;
> +			struct rfd *next_rfd = (struct rfd *)next_rx->skb->data;
> +			pci_dma_sync_single_for_cpu(nic->pdev,
> +				next_rx->dma_addr, sizeof(struct rfd),
> +				PCI_DMA_FROMDEVICE);
> +			if (next_rfd->status & cpu_to_le16(cb_complete)) {
> +				pci_unmap_single(nic->pdev, rx->dma_addr,
> +					RFD_BUF_LEN, PCI_DMA_FROMDEVICE);
> +				dev_kfree_skb_any(skb);
> +				rx->skb = NULL;
> +				rx->flags &= ~rx_s0;
> +				return 0;
> +			} else {
> +				status = readb(&nic->csr->scb.status);
> +				if (status & rus_no_res)
> +					nic->ru_running = ru_stopped;
> +			}
> +		}
>  		return -ENODATA;
> +	}
>  
>  	/* Get actual data size */
>  	actual_size = le16_to_cpu(rfd->actual_size) & 0x3FFF;
> @@ -1832,9 +1912,15 @@ static int e100_rx_indicate(struct nic *
>  	pci_unmap_single(nic->pdev, rx->dma_addr,
>  		RFD_BUF_LEN, PCI_DMA_FROMDEVICE);
>  
> -	/* this allows for a fast restart without re-enabling interrupts */
> -	if(le16_to_cpu(rfd->command) & cb_el)
> -		nic->ru_running = RU_SUSPENDED;
> +	/*
> +	 * This happens when hardward sees the rfd el flag set
> +	 * but then sees the rfd size set as well
> +	 */
> +	if (le16_to_cpu(rfd->command) & cb_el) {
> +		status = readb(&nic->csr->scb.status);
> +		if (status & rus_no_res)
> +			nic->ru_running = ru_stopped;
> +	}
>  
>  	/* Pull off the RFD and put the actual data (minus eth hdr) */
>  	skb_reserve(skb, sizeof(struct rfd));
> @@ -1865,32 +1951,34 @@ static int e100_rx_indicate(struct nic *
>  static void e100_rx_clean(struct nic *nic, unsigned int *work_done,
>  	unsigned int work_to_do)
>  {
> -	struct rx *rx;
> +	struct rx *rx, *marked_rx;
>  	int restart_required = 0;
> -	struct rx *rx_to_start = NULL;
> -
> -	/* are we already rnr? then pay attention!!! this ensures that
> -	 * the state machine progression never allows a start with a
> -	 * partially cleaned list, avoiding a race between hardware
> -	 * and rx_to_clean when in NAPI mode */
> -	if(RU_SUSPENDED == nic->ru_running)
> -		restart_required = 1;
> +	int err = 0;
>  
>  	/* Indicate newly arrived packets */
>  	for(rx = nic->rx_to_clean; rx->skb; rx = nic->rx_to_clean = rx->next) {
> -		int err = e100_rx_indicate(nic, rx, work_done, work_to_do);
> -		if(-EAGAIN == err) {
> -			/* hit quota so have more work to do, restart once
> -			 * cleanup is complete */
> -			restart_required = 0;
> +		err = e100_rx_indicate(nic, rx, work_done, work_to_do);
> +		/* Hit quota or no more to clean */
> +		if(-EAGAIN == err || -ENODATA == err)
>  			break;
> -		} else if(-ENODATA == err)
> -			break; /* No more to clean */
>  	}
>  
> -	/* save our starting point as the place we'll restart the receiver */
> -	if(restart_required)
> -		rx_to_start = nic->rx_to_clean;
> +	/*
> +	 * On EAGAIN, hit quota so have more work to do, restart once
> +	 * cleanup is complete.
> +	 * Else, are we already rnr? then pay attention!!! this ensures that
> +	 * the state machine progression never allows a start with a
> +	 * partially cleaned list, avoiding a race between hardware
> +	 * and rx_to_clean when in NAPI mode
> +	 */
> +	if(-EAGAIN != err && ru_stopped == nic->ru_running)
> +		restart_required = 1;
> +
> +	marked_rx = nic->rx_to_use->prev->prev;
> +	if (!(marked_rx->flags & rx_el)) {
> +		marked_rx = marked_rx->prev;
> +		BUG_ON(!marked_rx->flags & rx_el);
> +	}
>  
>  	/* Alloc new skbs to refill list */
>  	for(rx = nic->rx_to_use; !rx->skb; rx = nic->rx_to_use = rx->next) {
> @@ -1898,10 +1986,12 @@ static void e100_rx_clean(struct nic *ni
>  			break; /* Better luck next time (see watchdog) */
>  	}
>  
> +	e100_find_mark_el(nic, marked_rx, !restart_required);
> +
>  	if(restart_required) {
>  		// ack the rnr?
>  		writeb(stat_ack_rnr, &nic->csr->scb.stat_ack);
> -		e100_start_receiver(nic, rx_to_start);
> +		e100_start_receiver(nic);
>  		if(work_done)
>  			(*work_done)++;
>  	}
> @@ -1912,8 +2002,6 @@ static void e100_rx_clean_list(struct ni
>  	struct rx *rx;
>  	unsigned int i, count = nic->params.rfds.count;
>  
> -	nic->ru_running = RU_UNINITIALIZED;
> -
>  	if(nic->rxs) {
>  		for(rx = nic->rxs, i = 0; i < count; rx++, i++) {
>  			if(rx->skb) {
> @@ -1935,7 +2023,6 @@ static int e100_rx_alloc_list(struct nic
>  	unsigned int i, count = nic->params.rfds.count;
>  
>  	nic->rx_to_use = nic->rx_to_clean = NULL;
> -	nic->ru_running = RU_UNINITIALIZED;
>  
>  	if(!(nic->rxs = kcalloc(count, sizeof(struct rx), GFP_ATOMIC)))
>  		return -ENOMEM;
> @@ -1950,7 +2037,9 @@ static int e100_rx_alloc_list(struct nic
>  	}
>  
>  	nic->rx_to_use = nic->rx_to_clean = nic->rxs;
> -	nic->ru_running = RU_SUSPENDED;
> +	nic->ru_running = ru_stopped;
> +
> +	e100_find_mark_el(nic, NULL, 0);
>  
>  	return 0;
>  }
> @@ -1971,8 +2060,8 @@ static irqreturn_t e100_intr(int irq, vo
>  	iowrite8(stat_ack, &nic->csr->scb.stat_ack);
>  
>  	/* We hit Receive No Resource (RNR); restart RU after cleaning */
> -	if(stat_ack & stat_ack_rnr)
> -		nic->ru_running = RU_SUSPENDED;
> +	if (stat_ack & stat_ack_rnr)
> +		nic->ru_running = ru_stopped;
>  
>  	if(likely(netif_rx_schedule_prep(netdev))) {
>  		e100_disable_irq(nic);
> @@ -2065,7 +2154,7 @@ static int e100_up(struct nic *nic)
>  	if((err = e100_hw_init(nic)))
>  		goto err_clean_cbs;
>  	e100_set_multicast_list(nic->netdev);
> -	e100_start_receiver(nic, NULL);
> +	e100_start_receiver(nic);
>  	mod_timer(&nic->watchdog, jiffies);
>  	if((err = request_irq(nic->pdev->irq, e100_intr, IRQF_SHARED,
>  		nic->netdev->name, nic->netdev)))
> @@ -2146,7 +2235,7 @@ static int e100_loopback_test(struct nic
>  		mdio_write(nic->netdev, nic->mii.phy_id, MII_BMCR,
>  			BMCR_LOOPBACK);
>  
> -	e100_start_receiver(nic, NULL);
> +	e100_start_receiver(nic);
>  
>  	if(!(skb = netdev_alloc_skb(nic->netdev, ETH_DATA_LEN))) {
>  		err = -ENOMEM;

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* Re: [PATCH] [sis900] convert to NAPI, WAS Re: pktgen terminating condition
From: Mandeep Baines @ 2007-09-04 17:21 UTC (permalink / raw)
  To: hadi
  Cc: Daniele Venzano, davem, rick.jones2, msb, netdev, grundler,
	robert.olsson, jeff, nhorman
In-Reply-To: <1188911036.4483.10.camel@localhost>

On 9/4/07, jamal <hadi@cyberus.ca> wrote:
> On Mon, 2007-03-09 at 20:20 -0700, Mandeep Singh Baines wrote:
>
> > I didn't see much saving in interrupts on my machine (too fast, I guess).
>
> You could try the idea suggested by Dave earlier and just turn interupts
> for every nth packet. That should cut down the numbers.
>

I wanted to do that but I couldn't figure out how to free the last skb
if it happened to be a transmit that didn't generate a tx completion
interrupt.

Let's say I interrupt every 4th packet.I've already sent packets 0 to
4. Now I send packets 5 and 6. I can't figure out how to ensure that
the skb's for these packets get freed in a deterministic amount of
time if there is no interrupt? They will get freed when packet 8 gets
transmitted but there is no upper bound on when that will happen.

Maybe I could create a tx skb cleanup timer that I kickoff in
hard_start_xmit(). Every call to hard_start_xmit would reset the
timer. I could try this for a future patch. Not sure if the extra code
would cost me more than I get back in savings.

> > I did see a significant boost to tx performance by optimizing start_xmit: more
> > than double pps in pktgen.
>
> 148Kpps on a slow piece of hardware aint bad - Good Stuff. I wonder how
> much CPU is being abused.
>
> If you wanna go one extra mile (separate future patch): get rid of that
> tx lock and use netif_tx_lock on the interupt path. Look at some sane
> driver like tg3 for reference.
>

Cool. I'll check out the tg3.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox