Netdev List
 help / color / mirror / Atom feed
* Re: DM9000_IRQ_FLAGS
From: Daniel Mack @ 2008-01-15 11:42 UTC (permalink / raw)
  To: Remy Bohmer; +Cc: Ben Dooks, netdev
In-Reply-To: <3efb10970712200154u4c224071q6f5d5d7bb3bf0aa1@mail.gmail.com>

On Thu, Dec 20, 2007 at 10:54:34AM +0100, Remy Bohmer wrote:
> > I did have a go at trying to get people to pass the information this
> > way, but it seem to be ignored last time I sent it. I can dig out the
> > code that converts resource->flags to IRQT_ flags.
> 
> I thought this issue was already solved by using set_irq_type() in the
> BSP, just like all the other boards do...

Yes, exactly. I was trapped by another board file which specifies
interrupt types using 'struct resources'->flags for that
(arch/arm/mach-pxa/trizeps4.c) which is, as you pointed out earlier, not
the way to go. 

Sorry for the noise.

Best regards,
Daniel


^ permalink raw reply

* RE: [PATCH 0/3] UCC TDM driver for MPC83xx platforms
From: Aggrwal Poonam @ 2008-01-15 12:05 UTC (permalink / raw)
  To: Kumar Gala, Andrew Morton
  Cc: Phillips Kim, sfr, rubini, linux-ppcdev, netdev, linux-kernel,
	Barkowski Michael, Kalra Ashish, Cutler Richard, Suresh PV
In-Reply-To: <5927F7A4-D42A-40F6-AE9C-EDA34738A752@kernel.crashing.org>

Thanks Kumar/Morton/Kim

I shall make a small paragraph which describes the TDM driver
architecture and the interfaces it exposes.

As far as 8315 TDM is concerned it is a non QE driver and quite
different from this except for the functionality and external interface
it exposes.

Right now TDM is not a full-fledged bus driver which probably can be
done may be very similar to SPI.

Please give your suggestions on this.

With Regards
Poonam 
 
 

-----Original Message-----
From: Kumar Gala [mailto:galak@kernel.crashing.org] 
Sent: Tuesday, January 15, 2008 9:01 AM
To: Andrew Morton
Cc: Phillips Kim; Aggrwal Poonam; sfr@canb.auug.org.au;
rubini@vision.unipv.it; linux-ppcdev@ozlabs.kernel.org;
netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Barkowski Michael;
Kalra Ashish; Cutler Richard
Subject: Re: [PATCH 0/3] UCC TDM driver for MPC83xx platforms


On Jan 14, 2008, at 3:15 PM, Andrew Morton wrote:

> On Mon, 14 Jan 2008 12:00:51 -0600
> Kim Phillips <kim.phillips@freescale.com> wrote:
>
>> On Thu, 10 Jan 2008 21:41:20 -0700
>> "Aggrwal Poonam" <Poonam.Aggrwal@freescale.com> wrote:
>>
>>> Hello  All
>>>
>>> I am waiting for more feedback on the patches.
>>>
>>> If there are no objections please consider them for 2.6.25.
>>>
>> if this isn't going to go through Alessandro Rubini/misc drivers, can

>> it go through the akpm/mm tree?
>>
>
> That would work.  But it might be more appropriate to go Kumar-
> >paulus->Linus.

I'm ok w/taking the arch/powerpc bits, but I"m a bit concerned about  
the driver itself.  I'm wondering if we need a TDM framework in the  
kernel.

I guess if Poonam could possibly describe how this driver is actually  
used that would be helpful.  I see we have 8315 with a discrete TDM  
block and I'm guessing 82xx/85xx based CPM parts of some form of TDM  
as well.

- k

^ permalink raw reply

* Re: Not understand some in htb_do_events function
From: Martin Devera @ 2008-01-15 11:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478C87D9.1010305@trash.net>

Patrick McHardy wrote:
> Badalian Vyacheslav wrote:
>> Hello all.
>> I have many messages like "htb: too many events !" in dmesg.
>>
>> Try to see code and find that function try do 500 events at call.
>> Hm... may anyone ask why 500? Why its not dynamic value based on 
>> performance of PC?
> 
> 
> Thats a good question, I wonder why it is limited at all.
> Martin, any hints?
> 
> 

Hi, I recently replied someone to the same question:

 > it is possible when during one jiffie (1 or 10ms) more than 500 classes
 > changed its state. It is meant to protect your system from livelock.
 > The constant should be set to something like
 > bogomips/bogocomplexity_of_state_change but it was not done.

the solution I have in my mind is to change
    if (net_ratelimit())
                  printk(KERN_WARNING "htb: too many events !\n");
          return HZ/10;
  to
    return 1;

to drain extra events asap. It the time of writing I was not able to
come with better solution and there were more bugs related to this
part of code than now.
We want way to smooth big burst of events over more dequeue invocations
in order to not slow dequeue too much. Constant 500 is max. allowed
"slowdown" of dequeue.
Any bright idea how to do it more elegant, Patrick ?

brgds, devik

^ permalink raw reply

* [PATCH] net: EMAC: Fix problem with mtu > 4080 on non TAH equipped 4xx PPC's
From: Stefan Roese @ 2008-01-15 12:40 UTC (permalink / raw)
  To: linuxppc-dev, netdev; +Cc: ebs, benh

Currently, all non TAH equipped 4xx PPC's call emac_start_xmit() upon
xmit. This routine doesn't check if the frame length exceeds the max.
MAL buffer size.

This patch now changes the driver to call emac_start_xmit_sg() on all
platforms and not only the TAH equipped ones (440GX). This enables an
MTU of 9000 instead 4080.

Tested on Kilauea (405EX) with gbit link -> jumbo frames enabled.

Signed-off-by: Stefan Roese <sr@denx.de>
---
Eugene & Ben, do you see any problems with this patch? If not, then I'll
send another version for the newemac driver too.

Thanks.

 drivers/net/ibm_emac/ibm_emac_core.c |   10 ++--------
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ibm_emac/ibm_emac_core.c b/drivers/net/ibm_emac/ibm_emac_core.c
index 73664f2..198de44 100644
--- a/drivers/net/ibm_emac/ibm_emac_core.c
+++ b/drivers/net/ibm_emac/ibm_emac_core.c
@@ -1089,7 +1089,6 @@ static int emac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	return emac_xmit_finish(dev, len);
 }
 
-#if defined(CONFIG_IBM_EMAC_TAH)
 static inline int emac_xmit_split(struct ocp_enet_private *dev, int slot,
 				  u32 pd, int len, int last, u16 base_ctrl)
 {
@@ -1203,9 +1202,6 @@ static int emac_start_xmit_sg(struct sk_buff *skb, struct net_device *ndev)
 	DBG2("%d: stopped TX queue" NL, dev->def->index);
 	return 1;
 }
-#else
-# define emac_start_xmit_sg	emac_start_xmit
-#endif	/* !defined(CONFIG_IBM_EMAC_TAH) */
 
 /* BHs disabled */
 static void emac_parse_tx_error(struct ocp_enet_private *dev, u16 ctrl)
@@ -2163,11 +2159,9 @@ static int __init emac_probe(struct ocp_device *ocpdev)
 
 	/* Fill in the driver function table */
 	ndev->open = &emac_open;
-	if (dev->tah_dev) {
-		ndev->hard_start_xmit = &emac_start_xmit_sg;
+	ndev->hard_start_xmit = &emac_start_xmit_sg;
+	if (dev->tah_dev)
 		ndev->features |= NETIF_F_IP_CSUM | NETIF_F_SG;
-	} else
-		ndev->hard_start_xmit = &emac_start_xmit;
 	ndev->tx_timeout = &emac_full_tx_reset;
 	ndev->watchdog_timeo = 5 * HZ;
 	ndev->stop = &emac_close;
-- 
1.5.4.rc3


^ permalink raw reply related

* Re: Not understand some in htb_do_events function
From: Badalian Vyacheslav @ 2008-01-15 13:51 UTC (permalink / raw)
  To: Martin Devera; +Cc: netdev, kaber
In-Reply-To: <478CA02A.1070308@cdi.cz>

Martin Devera пишет:
> Patrick McHardy wrote:
>> Badalian Vyacheslav wrote:
>>> Hello all.
>>> I have many messages like "htb: too many events !" in dmesg.
>>>
>>> Try to see code and find that function try do 500 events at call.
>>> Hm... may anyone ask why 500? Why its not dynamic value based on 
>>> performance of PC?
>>
>>
>> Thats a good question, I wonder why it is limited at all.
>> Martin, any hints?
>>
>>
>
> Hi, I recently replied someone to the same question:
>
> > it is possible when during one jiffie (1 or 10ms) more than 500 classes
> > changed its state. It is meant to protect your system from livelock.
> > The constant should be set to something like
> > bogomips/bogocomplexity_of_state_change but it was not done.
>
> the solution I have in my mind is to change
>    if (net_ratelimit())
>                  printk(KERN_WARNING "htb: too many events !\n");
>          return HZ/10;
>  to
>    return 1;
>
> to drain extra events asap. It the time of writing I was not able to
> come with better solution and there were more bugs related to this
> part of code than now.
> We want way to smooth big burst of events over more dequeue invocations
> in order to not slow dequeue too much. Constant 500 is max. allowed
> "slowdown" of dequeue.
> Any bright idea how to do it more elegant, Patrick ?
>
> brgds, devik

Hmm.. i simple see that if i get messages like this in dmesg my PC limit 
speed, but i have many free CPU and HTB classes have not full bandwidth 
load....
If i delete half rules - i get bandwidth that i wont. I wrong understand 
that if we have this limit (constant 500) my PC can't get full performance?

Look scheme (example in my memory... not real numbers... for idea only):
HTB (default 1)
- Class 1 (rate 250mbs cell 500 prio 1)
---- Here 8000 HTB rules (parent 1) (rate 1kbs, cell from 200kbs to 5mbs)
- Class 2 (rate 250mbs cell 500 prio 1)

I always go to 2.
try ttcp.
current rate in class 1 - above 150mbs
current rate in class 2 - above 200mbs
Load avg above 0.10 - 0.25
Idle - 80%
 
Try delete half rules
current rate in class 1 - above 100mbs
current rate in class 2 - above 400mbs
Load avg above 0.10 - 0.25
Idle - 80%

All traffic in class 2 do i witch ttcp! its not deleted clients from 
class 1.

Any idea? Thanks.

^ permalink raw reply

* Re: Not understand some in htb_do_events function
From: Martin Devera @ 2008-01-15 13:54 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev, kaber
In-Reply-To: <478CBA44.1090909@bigtelecom.ru>

Badalian Vyacheslav wrote:
> Martin Devera пишет:
>> Patrick McHardy wrote:
>>> Badalian Vyacheslav wrote:
>>>> Hello all.
>>>> I have many messages like "htb: too many events !" in dmesg.
>>>>
>>>> Try to see code and find that function try do 500 events at call.
>>>> Hm... may anyone ask why 500? Why its not dynamic value based on 
>>>> performance of PC?
>>>
>>>
>>> Thats a good question, I wonder why it is limited at all.
>>> Martin, any hints?
>>>
>>>
>>
>> Hi, I recently replied someone to the same question:
>>
>> > it is possible when during one jiffie (1 or 10ms) more than 500 classes
>> > changed its state. It is meant to protect your system from livelock.
>> > The constant should be set to something like
>> > bogomips/bogocomplexity_of_state_change but it was not done.
>>
>> the solution I have in my mind is to change
>>    if (net_ratelimit())
>>                  printk(KERN_WARNING "htb: too many events !\n");
>>          return HZ/10;
>>  to
>>    return 1;
>>
>> to drain extra events asap. It the time of writing I was not able to
>> come with better solution and there were more bugs related to this
>> part of code than now.
>> We want way to smooth big burst of events over more dequeue invocations
>> in order to not slow dequeue too much. Constant 500 is max. allowed
>> "slowdown" of dequeue.
>> Any bright idea how to do it more elegant, Patrick ?
>>
>> brgds, devik
> 
> Hmm.. i simple see that if i get messages like this in dmesg my PC limit 
> speed, but i have many free CPU and HTB classes have not full bandwidth 
> load....
> If i delete half rules - i get bandwidth that i wont. I wrong understand 
> that if we have this limit (constant 500) my PC can't get full performance?
> 
> Look scheme (example in my memory... not real numbers... for idea only):
> HTB (default 1)
> - Class 1 (rate 250mbs cell 500 prio 1)
> ---- Here 8000 HTB rules (parent 1) (rate 1kbs, cell from 200kbs to 5mbs)
> - Class 2 (rate 250mbs cell 500 prio 1)
> 
> I always go to 2.
> try ttcp.
> current rate in class 1 - above 150mbs
> current rate in class 2 - above 200mbs
> Load avg above 0.10 - 0.25
> Idle - 80%
> 
> Try delete half rules
> current rate in class 1 - above 100mbs
> current rate in class 2 - above 400mbs
> Load avg above 0.10 - 0.25
> Idle - 80%
> 
> All traffic in class 2 do i witch ttcp! its not deleted clients from 
> class 1.
> 
> Any idea? Thanks.

The change above (returning 1 instead of HZ/10) should help, it was tested. Also
you CAN increase the 500 to say 2000. I didn't the change as I'm not sure it is
good generic fix...

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: Frans Pop @ 2008-01-15 14:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20080114.215317.38045859.davem@davemloft.net>

On Tuesday 15 January 2008, David Miller wrote:
> From: Frans Pop <elendil@planet.nl>
> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>
> Does this make the problem go away?

Yes, it very much looks like that solves it.
I ran with the patch for 6 hours or so without any errors. I then switched 
back to an unpatched kernel and they reappeared immediately.

> (Note this isn't the final correct patch we should apply.  There
>  is no reason why this revert back to the older ->poll() logic
>  here should have any effect on the TX hang triggering...)

s/no reason/no obvious reason/ ? ;-)

Cheers,
FJP

^ permalink raw reply

* [patch v3] Documentation: add a guideline for hard_start_xmit method
From: Matti Linnanvuori @ 2008-01-15 14:25 UTC (permalink / raw)
  To: netdev, jgarzik

From: Matti Linnanvuori <mattilinnanvuori@yahoo.com>

Add a guideline not to modify SKBs.

Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com>

---

--- a/Documentation/networking/driver.txt 2008-01-13 07:14:01.608291500 +0200
+++ b/Documentation/networking/driver.txt 2008-01-15 16:16:14.636054000 +0200
@@ -61,7 +61,10 @@ Transmit path guidelines:
 2) Do not forget to update netdev->trans_start to jiffies after
    each new tx packet is given to the hardware.
 
-3) Do not forget that once you return 0 from your hard_start_xmit
+3) A hard_start_xmit method must not modify the shared parts of a
+   cloned SKB.
+
+4) Do not forget that once you return 0 from your hard_start_xmit
    method, it is your driver's responsibility to free up the SKB
    and in some finite amount of time.


      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

^ permalink raw reply

* Re: [PATCH 2/4] [ROSE] rose_get_route() template
From: Bernard Pidoux @ 2008-01-15 14:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ralf Baechle DL5RB, Alexey Dobriyan, David Miller,
	Linux Netdev List
In-Reply-To: <47893127.2080606@ccr.jussieu.fr>

Hi,

I wrote a "simplified" get_route() function.
It is declared as static following judicious Eric's remark.

Then, the following patch of include/net/rose.h is no more
necessary.

A new commit for rose_get_route() will be presented in next
message.

Thank you Eric for pushing me to reexamine my code.

Regards,

Bernard P.



Bernard Pidoux wrote:
> 
> 
> Eric Dumazet wrote:
>> Bernard Pidoux a écrit :
>>>  From 46bccce1e660a39bcc8f8cf87d4c17de33f4ba48 Mon Sep 17 00:00:00 2001
>>> From: Bernard Pidoux <f6bvp@amsat.org>
>>> Date: Thu, 10 Jan 2008 23:01:46 +0100
>>> Subject: [PATCH 2/4] [ROSE] template declaration for rose_get_route()
>>>
>>> Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
>>> ---
>>>  include/net/rose.h |    1 +
>>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/net/rose.h b/include/net/rose.h
>>> index e5bb084..d3ab453 100644
>>> --- a/include/net/rose.h
>>> +++ b/include/net/rose.h
>>> @@ -202,6 +202,7 @@ extern struct net_device *rose_dev_first(void);
>>>  extern struct net_device *rose_dev_get(rose_address *);
>>>  extern struct rose_route *rose_route_free_lci(unsigned int, struct 
>>> rose_neigh *);
>>>  extern struct rose_neigh *rose_get_neigh(rose_address *, unsigned 
>>> char *, unsigned char *);
>>> +extern struct rose_neigh *rose_get_route(rose_address *, unsigned 
>>> char *, unsigned char *);
>>>  extern int  rose_rt_ioctl(unsigned int, void __user *);
>>>  extern void rose_link_failed(ax25_cb *, int);
>>>  extern int  rose_route_frame(struct sk_buff *, ax25_cb *);
>>> -- 
>>
>> Strange... if rose_get_route() is used only in net/rose/rose_route.c, 
>> why dont you define it static, and not extern in include/net/rose.h ?
>>
>>
>>
> 
> I agree. You are perfectly right.
> There is no need to declare rose_get_route() external.
> I stupidly copied rose_get_neigh()definition from which I derived 
> rose_get_route();
> 
> Also I am not sure that setting cause and diagnostic is necessary, as 
> they are not used by the calling function.
> 
> By the way. I made a typo in [PATCH 4/4].
> 
> Instead of "Initial connection to rose neighbour nodes was unusually,
> t0 timer was blocked and application program could not
> use socket."
> 
> One should read
> "Initial connection to rose neighbour nodes was unusually,
> long, t0 timer was blocked and application program could not
> use socket."
> 
> Bernard P.
> 
> 
> 


^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-15 14:47 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20080115071950.GA1696@ff.dom.local>

Jarek Poplawski wrote:

> IMHO, checking this with a current stable, which probably you are going
> to do some day, anyway, should be 100% acceptable: giving some input to
> netdev, while still working for yourself.

While I would love to do this, it's not that simple.

Some of our hardware is not supported on mainline, so we need per-kernel 
version patches to even bring up the blade.  The blades netboot via a 
jumbo-frame network, so kernel extensions are needed to handle setting 
the MTU before mounting the rootfs.  The blade in question uses CKRM 
which doesn't exist for newer kernels so the problem may simply be 
hidden by scheduling differences.  The userspace application uses other 
kernel features that are not in mainline (and likely some of them won't 
ever be in mainline--I know because I've tried).

Basically, the number of changes required for our environment makes it 
difficult to just boot up the latest kernel.

Chris

^ permalink raw reply

* [PATCH 1/4] [ROSE] simplified rose_get_route()
From: Bernard Pidoux @ 2008-01-15 15:00 UTC (permalink / raw)
  To: Ralf Baechle DL5RB, Alexey Dobriyan, David Miller,
	Linux Netdev List, Eric Dumazet
In-Reply-To: <47630274.1080706@ccr.jussieu.fr>

Hi,

This patch is send to replace preceding commit :

 From fd66cc115e058b2fc63a0e26aa73f1d27113105a Mon Sep 17 00:00:00 2001
From: Bernard Pidoux <f6bvp@amsat.org>
Date: Thu, 10 Jan 2008 23:10:44 +0100
Subject: [PATCH 1/4] [ROSE] new rose_get_route() function


I removed unnecessary lines of code copied from rose_get_neigh().
The function is now called with fewer arguments and if
a NULL is returned then the next call to rose_transmit_clear_request()
will manage the error code. I verified that the frame routing was
actually handled as required on a FPAC/ROSE node with heavy BBS traffic 
through 8 adjacent nodes connected through radio ports and Internet 
AXUDP / AXIP connections.


 From 6708b3853a5ecf5c185942b3757223e451af1960 Mon Sep 17 00:00:00 2001
From: Bernard Pidoux <f6bvp@amsat.org>
Date: Sun, 13 Jan 2008 20:25:40 +0100
Subject: [PATCH 1/4] [ROSE] simplified rose_get_route()

rose_get_neigh() was called by two different functions.
Firstly, by rose_connect() in order to establish connections to
adjacent rose nodes. This worked correctly.
Secondly, it was called by rose_route_frame() to find a route
via an adjacent node. This was not working efficiently, for the
whole node neighbour list was not checked and the proper test
was not performed in order to check if a node was already connected.
We create new function rose_get_route() to achieve this.
It returns a ROSE node address for sending a frame via a connected node
to the specified destination address.If a NULL is returned, the
previous program behavior is performed. ROSE tries to initiate a connect 
to the appropriate adjacent node.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
---
  net/rose/rose_route.c |   24 +++++++++++++++++++++---
  1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c
index 540c0f2..5d6fac4 100644
--- a/net/rose/rose_route.c
+++ b/net/rose/rose_route.c
@@ -662,6 +662,25 @@ struct rose_route *rose_route_free_lci(unsigned int 
lci, struct rose_neigh *neig
  }

  /*
+ *     Find an opened route given a ROSE address.
+ */
+static struct rose_neigh *rose_get_route(rose_address *addr)
+{
+       struct rose_node *node;
+       int i;
+
+       for (node = rose_node_list; node != NULL; node = node->next) {
+               if (rosecmpm(addr, &node->address, node->mask) == 0) {
+                       for (i = 0; i < node->count; i++) {
+                               if (node->neighbour[i]->restarted)
+                                       return node->neighbour[i];
+                       }
+               }
+       }
+       return NULL;
+}
+
+/*
   *     Find a neighbour given a ROSE address.
   */
  struct rose_neigh *rose_get_neigh(rose_address *addr, unsigned char 
*cause,
@@ -842,7 +861,6 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
         struct sock *sk;
         unsigned short frametype;
         unsigned int lci, new_lci;
-       unsigned char cause, diagnostic;
         struct net_device *dev;
         int len, res = 0;
         char buf[11];
@@ -1019,8 +1037,8 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb 
*ax25)
                 rose_route = rose_route->next;
         }

-       if ((new_neigh = rose_get_neigh(dest_addr, &cause, &diagnostic)) 
== NULL) {
-               rose_transmit_clear_request(rose_neigh, lci, cause, 
diagnostic);
+       if ((new_neigh = rose_get_route(dest_addr)) == NULL) {
+               rose_transmit_clear_request(rose_neigh, lci, 
ROSE_NOT_OBTAINABLE, 0);
                 goto out;
         }

--
1.5.3.7


^ permalink raw reply related

* Re: questions on NAPI processing latency and dropped network packets
From: Vlad Yasevich @ 2008-01-15 15:09 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Eric Dumazet, Ray Lee, netdev, linux-kernel
In-Reply-To: <478BBFE3.9030707@nortel.com>

Chris Friesen wrote:
> Eric Dumazet wrote:
>> Chris Friesen a écrit :
> 
>>> Based on the profiling information we're spending time in 
>>> sctp_endpoint_lookup_assoc() which doesn't actually use hashes, so I 
>>> can't see how the hash would be related.  I'm pretty new to SCTP 
>>> though, so I may be missing something.
>>
>> Well, it does use hashes :)
>>
>>        hash = sctp_assoc_hashfn(ep->base.bind_addr.port, rport);
>>        head = &sctp_assoc_hashtable[hash];
>>        read_lock(&head->lock);
>>        sctp_for_each_hentry(epb, node, &head->chain) {
>>            /* maybe your machine is traversing here a *really* long 
>> chain */
>>            }
> 
> 
> The latest released kernel doesn't have this code, it was only added in 
> November.  The SCTP maintainer just pointed me to the patch, and made 
> some other suggestions as well.
> 

Yes,  the hash code only got added to 2.6.24.  Before that, it was
a linear list traversal which sucked.

I need to take a look at the locking to see if we can further reduce
the time that bottom half is disabled.

-vlad

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Radoslaw Szkodzinski @ 2008-01-15 15:17 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Jarek Poplawski, David Miller, netdev, linux-kernel
In-Reply-To: <478CC76B.1020804@nortel.com>

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

On Tue, 15 Jan 2008 08:47:07 -0600
"Chris Friesen" <cfriesen@nortel.com> wrote:

> Jarek Poplawski wrote:
> 
> > IMHO, checking this with a current stable, which probably you are going
> > to do some day, anyway, should be 100% acceptable: giving some input to
> > netdev, while still working for yourself.
> 
> While I would love to do this, it's not that simple.
> 
> Some of our hardware is not supported on mainline, so we need per-kernel 
> version patches to even bring up the blade.  The blades netboot via a 
> jumbo-frame network, so kernel extensions are needed to handle setting 
> the MTU before mounting the rootfs.

Why? Can't you use a small initramfs to set it up?

> The blade in question uses CKRM 
> which doesn't exist for newer kernels so the problem may simply be 
> hidden by scheduling differences.

Current spiritual successor is Ingo's realtime patchset I guess.

> The userspace application uses other 
> kernel features that are not in mainline (and likely some of them won't 
> ever be in mainline--I know because I've tried).

What would these be? Some proc or sysfs files that were removed or
renamed?
Maybe they can be worked around w/o changing the application at all, or
very minor changes.

Also, be sure to check if there are pauses with other CPU hogs.

> Basically, the number of changes required for our environment makes it 
> difficult to just boot up the latest kernel.

Yes, adding an initramfs is non-trivial (some busybox + shell
scripting), but not all that hard either.
And maintaining one is easier than patching the kernel.
(usually 0 maintenance, maybe minor fixes sometimes)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [NETFILTER]: Supress some sparse warnings
From: Patrick McHardy @ 2008-01-15 15:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org, Netfilter Development Mailinglist
In-Reply-To: <20080115124046.1c10dfeb.dada1@cosmosbay.com>

Eric Dumazet wrote:
> Hi Patrick
> 
> Please find some sparse cleanups, against net-2.6.25
> 
> Thank you
> 
> [NETFILTER]: Supress some sparse warnings
> 
>   CHECK   net/netfilter/nf_conntrack_expect.c
> net/netfilter/nf_conntrack_expect.c:429:13: warning: context imbalance in 'exp_seq_start' - wrong count at exit
> net/netfilter/nf_conntrack_expect.c:441:13: warning: context imbalance in 'exp_seq_stop' - unexpected unlock
>   CHECK   net/netfilter/nf_log.c
> net/netfilter/nf_log.c:105:13: warning: context imbalance in 'seq_start' - wrong count at exit
> net/netfilter/nf_log.c:125:13: warning: context imbalance in 'seq_stop' - unexpected unlock
>   CHECK   net/netfilter/nfnetlink_queue.c
> net/netfilter/nfnetlink_queue.c:363:7: warning: symbol 'size' shadows an earlier one
> net/netfilter/nfnetlink_queue.c:217:9: originally declared here
> net/netfilter/nfnetlink_queue.c:847:13: warning: context imbalance in 'seq_start' - wrong count at exit
> net/netfilter/nfnetlink_queue.c:859:13: warning: context imbalance in 'seq_stop' - unexpected unlock

Applied, thanks Eric.


^ permalink raw reply

* Re: Not understand some in htb_do_events function
From: Patrick McHardy @ 2008-01-15 15:54 UTC (permalink / raw)
  To: Martin Devera; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478CA02A.1070308@cdi.cz>

Martin Devera wrote:
> Patrick McHardy wrote:
>> Badalian Vyacheslav wrote:
>>> Hello all.
>>> I have many messages like "htb: too many events !" in dmesg.
>>>
>>> Try to see code and find that function try do 500 events at call.
>>> Hm... may anyone ask why 500? Why its not dynamic value based on 
>>> performance of PC?
>>
>>
>> Thats a good question, I wonder why it is limited at all.
>> Martin, any hints?
>>
> Hi, I recently replied someone to the same question:
> 
>  > it is possible when during one jiffie (1 or 10ms) more than 500 classes
>  > changed its state. It is meant to protect your system from livelock.
>  > The constant should be set to something like
>  > bogomips/bogocomplexity_of_state_change but it was not done.
> 
> the solution I have in my mind is to change
>    if (net_ratelimit())
>                  printk(KERN_WARNING "htb: too many events !\n");
>          return HZ/10;
>  to
>    return 1;
> 
> to drain extra events asap. It the time of writing I was not able to
> come with better solution and there were more bugs related to this
> part of code than now.

So this was meant to protect against endless loops?

> We want way to smooth big burst of events over more dequeue invocations
> in order to not slow dequeue too much. Constant 500 is max. allowed
> "slowdown" of dequeue.
> Any bright idea how to do it more elegant, Patrick ?


Unfortunately not, but I believe simply removing the limit
completely would be better than picking an arbitary value.

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: slavon @ 2008-01-15 16:04 UTC (permalink / raw)
  To: Frans Pop; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <200801151504.10047.elendil@planet.nl>

Quoting Frans Pop <elendil@planet.nl>:

> On Tuesday 15 January 2008, David Miller wrote:
>> From: Frans Pop <elendil@planet.nl>
>> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>>
>> Does this make the problem go away?
>
> Yes, it very much looks like that solves it.
> I ran with the patch for 6 hours or so without any errors. I then switched
> back to an unpatched kernel and they reappeared immediately.
>
>> (Note this isn't the final correct patch we should apply.  There
>>  is no reason why this revert back to the older ->poll() logic
>>  here should have any effect on the TX hang triggering...)
>
> s/no reason/no obvious reason/ ? ;-)
>
> Cheers,
> FJP
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

Hello.

I also try your patch (apply to 2.6.24-rc7-git2)

I catch this message in dmesg
[ 1771.796954] e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 1771.796957]   Tx Queue             <0>
[ 1771.796958]   TDH                  <54>
[ 1771.796959]   TDT                  <54>
[ 1771.796960]   next_to_use          <54>
[ 1771.796961]   next_to_clean        <a9>
[ 1771.796962] buffer_info[next_to_clean]
[ 1771.796963]   time_stamp           <14d72e>
[ 1771.796964]   next_to_watch        <a9>
[ 1771.796965]   jiffies              <14ddd3>
[ 1771.796966]   next_to_watch.status <1>

Thanks.


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply

* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Patrick McHardy @ 2008-01-15 16:05 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev
In-Reply-To: <478C94B7.3070503@bigtelecom.ru>

Badalian Vyacheslav wrote:
> Hello all. Have packetlost when do "tc qdisc del dev eth0 root".
> 
> look:
> 
> slavon ~ # ping -f 87.255.1.134
> PING 87.255.1.134 (87.255.1.134) 56(84) bytes of data.
> .
> ..................................................................................................................................................................................................................................................................... 
> 
> .
> --- 87.255.1.134 ping statistics ---
> 60811 packets transmitted, 60544 received, 0% packet loss, time 39528ms
> rtt min/avg/max/mdev = 0.203/0.579/3227.655/13.124 ms, pipe 219, 
> ipg/ewma 0.650/2.197 ms
> 
> Its normal?

Yes, packets in the old qdisc are lost.

> Maybe if tc do changes - need create second queue (hash of rules or how 
> you named it?) and do changes at it. Then replace old queue rules by 
> created new.
> Logic -
> 1. Do snapshot
> 2. Do changes in shapshot
> 3. All new packets go to snapshot
> 4. If old queue not have packets - delete it.
> 5. Snapshot its default.


That doesn't really work since qdiscs keep internal state that
in large parts depends on the packets queued. Take the qlen as
a simple example, the new qdisc doesn't know about the packets
in the old one and will exceed the limit.

^ permalink raw reply

* Re: Packetlost when "tc qdisc del dev eth0 root"
From: slavon @ 2008-01-15 16:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478CD9D6.3000504@trash.net>

I understand. Thanks!

> Badalian Vyacheslav wrote:
>> Hello all. Have packetlost when do "tc qdisc del dev eth0 root".
>>
>> look:
>>
>> slavon ~ # ping -f 87.255.1.134
>> PING 87.255.1.134 (87.255.1.134) 56(84) bytes of data.
>> .
>> .....................................................................................................................................................................................................................................................................   
>> .
>> --- 87.255.1.134 ping statistics ---
>> 60811 packets transmitted, 60544 received, 0% packet loss, time 39528ms
>> rtt min/avg/max/mdev = 0.203/0.579/3227.655/13.124 ms, pipe 219,   
>> ipg/ewma 0.650/2.197 ms
>>
>> Its normal?
>
> Yes, packets in the old qdisc are lost.
>
>> Maybe if tc do changes - need create second queue (hash of rules or  
>>  how you named it?) and do changes at it. Then replace old queue   
>> rules by created new.
>> Logic -
>> 1. Do snapshot
>> 2. Do changes in shapshot
>> 3. All new packets go to snapshot
>> 4. If old queue not have packets - delete it.
>> 5. Snapshot its default.
>
>
> That doesn't really work since qdiscs keep internal state that
> in large parts depends on the packets queued. Take the qlen as
> a simple example, the new qdisc doesn't know about the packets
> in the old one and will exceed the limit.



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Stephen Hemminger @ 2008-01-15 16:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, robert.olsson, netdev
In-Reply-To: <478C4FBE.5040308@cosmosbay.com>

On Tue, 15 Jan 2008 07:16:30 +0100
Eric Dumazet <dada1@cosmosbay.com> wrote:

> Eric Dumazet a écrit :
> > Stephen Hemminger a écrit :
> >> Combine the prefix information and the leaf together into one
> >> allocation. This is furthur simplified by converting the hlist
> >> into a simple bitfield.
> >>
> >> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
> >>
> >> This patch is experimental (ie it boots and loads routes), but
> >> is slower for the 163,000 route dump test.
> >>
> > 
> > Its also very memory expensive. That was not Robert suggestion I believe.
> > 
> > Its suggestion is to embedd one info into each leaves.
> > 
> > Please find attached to this mail a preliminary and incomplete patch I 
> > wrote this morning before coffee :), to get the idea.
> 
> Oops, patch reversed, sorry, and against another work pending in my tree.
> 
> Now time for cofee :)
> 

Okay, I would rather see the leaf_info explicit inside the leaf, also
your scheme probably breaks if I add two prefixes and then delete the first.
Let me have a go at it.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Robert Olsson @ 2008-01-15 16:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, robert.olsson, netdev
In-Reply-To: <20080115081905.3b8b6f05@deepthought>


Stephen Hemminger writes:

 > Okay, I would rather see the leaf_info explicit inside the leaf, also
 > your scheme probably breaks if I add two prefixes and then delete the first.
 > Let me have a go at it.

 I took Eric's patch a bit further... 
 
 Support for delete and dump is needed before any testing at all
 and maybe some macro for checking and setting FP-leaf's

 Cheers.
					--ro


diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 6dab753..f5b276c 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -97,22 +97,32 @@ typedef unsigned int t_key;
 #define IS_LEAF(n) (n->parent & T_LEAF)
 
 struct node {
-	unsigned long parent;
-	t_key key;
+	unsigned long		parent;
+	t_key			key;
 };
 
 struct leaf {
-	unsigned long parent;
-	t_key key;
-	struct hlist_head list;
-	struct rcu_head rcu;
+	unsigned long		parent;
+	t_key			key;
+	/*
+	 * Because we often have only one info per leaf, we embedd one here
+	 * to save some space and speedup lookups (sharing cache line)
+	 * a sort of fastpatch leaf (FP-leaf) this indicated a negative of
+	 * plen_iinfo.
+	 * Note : not inside a structure so that we dont have holes on 64bit 
+	 */
+	int 			plen_iinfo;
+	struct list_head	falh_iinfo;
+
+	struct hlist_head	list;
+	struct rcu_head		rcu;
 };
 
 struct leaf_info {
-	struct hlist_node hlist;
-	struct rcu_head rcu;
-	int plen;
-	struct list_head falh;
+	struct hlist_node	hlist;
+	int			plen;
+	struct list_head	falh;
+	struct rcu_head		rcu;
 };
 
 struct tnode {
@@ -364,11 +374,13 @@ static inline void tnode_free(struct tnode *tn)
 		call_rcu(&tn->rcu, __tnode_free_rcu);
 }
 
-static struct leaf *leaf_new(void)
+static struct leaf *leaf_new(int plen)
 {
 	struct leaf *l = kmalloc(sizeof(struct leaf),  GFP_KERNEL);
 	if (l) {
 		l->parent = T_LEAF;
+		l->plen_iinfo = plen;
+		INIT_LIST_HEAD(&l->falh_iinfo);
 		INIT_HLIST_HEAD(&l->list);
 	}
 	return l;
@@ -890,7 +902,16 @@ static struct leaf_info *find_leaf_info(struct leaf *l, int plen)
 
 static inline struct list_head * get_fa_head(struct leaf *l, int plen)
 {
-	struct leaf_info *li = find_leaf_info(l, plen);
+	struct leaf_info *li;
+
+	if (l->plen_iinfo >= 0) {
+		if(l->plen_iinfo == plen)
+			return &l->falh_iinfo;
+		else
+			return NULL;
+	}
+
+	li = find_leaf_info(l, plen);
 
 	if (!li)
 		return NULL;
@@ -1036,6 +1057,21 @@ static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
 
 	if (n != NULL && IS_LEAF(n) && tkey_equals(key, n->key)) {
 		l = (struct leaf *) n;
+
+		/* 
+		 * Check if it is a FP-leaf if so we have to convert 
+		 * it to an ordinary leaf with leaf_infos
+		 */
+
+		if( l->plen_iinfo >= 0 ) {
+			li = leaf_info_new(l->plen_iinfo);
+			if (!li)
+				return NULL;
+
+			insert_leaf_info(&l->list, li);
+			l->plen_iinfo = -1;
+		}
+
 		li = leaf_info_new(plen);
 
 		if (!li)
@@ -1045,21 +1081,16 @@ static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
 		insert_leaf_info(&l->list, li);
 		goto done;
 	}
-	l = leaf_new();
+
+	/* Insert a FP-leaf */
+	l = leaf_new(plen);
 
 	if (!l)
 		return NULL;
 
 	l->key = key;
-	li = leaf_info_new(plen);
-
-	if (!li) {
-		tnode_free((struct tnode *) l);
-		return NULL;
-	}
+	fa_head = &l->falh_iinfo;
 
-	fa_head = &li->falh;
-	insert_leaf_info(&l->list, li);
 
 	if (t->trie && n == NULL) {
 		/* Case 2: n is NULL, and will just insert a new leaf */
@@ -1089,7 +1120,6 @@ static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
 		}
 
 		if (!tn) {
-			free_leaf_info(li);
 			tnode_free((struct tnode *) l);
 			return NULL;
 		}
@@ -1285,6 +1315,20 @@ static inline int check_leaf(struct trie *t, struct leaf *l,
 	struct hlist_head *hhead = &l->list;
 	struct hlist_node *node;
 
+	i = l->plen_iinfo;	
+	if( i >= 0) {   /* FP-leaf */
+		mask = inet_make_mask(i);
+		err = fib_semantic_match(&l->falh_iinfo, flp, res, htonl(l->key), mask, i);
+		if(err <= 0 ) {
+			*plen = i;
+#ifdef CONFIG_IP_FIB_TRIE_STATS
+			t->stats.semantic_match_passed++;
+#endif
+		return err;
+		}
+	}
+
+
 	hlist_for_each_entry_rcu(li, node, hhead, hlist) {
 		i = li->plen;
 		mask = inet_make_mask(i);
@@ -1614,7 +1658,7 @@ static int fn_trie_delete(struct fib_table *tb, struct fib_config *cfg)
 	li = find_leaf_info(l, plen);
 
 	list_del_rcu(&fa->fa_list);
-
+// FIXME
 	if (list_empty(fa_head)) {
 		hlist_del_rcu(&li->hlist);
 		free_leaf_info(li);
@@ -2336,12 +2380,11 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 		seq_indent(seq, iter->depth);
 		seq_printf(seq, "  |-- %d.%d.%d.%d\n", NIPQUAD(val));
 		for (i = 32; i >= 0; i--) {
-			struct leaf_info *li = find_leaf_info(l, i);
+			struct list_head *fa_head = get_fa_head(l, i);
+			if (fa_head) {
 
-			if (li) {
 				struct fib_alias *fa;
-
-				list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+				list_for_each_entry_rcu(fa, fa_head, fa_list) {
 					char buf1[32], buf2[32];
 
 					seq_indent(seq, iter->depth+1);
@@ -2424,17 +2467,18 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 		return 0;
 
 	for (i=32; i>=0; i--) {
-		struct leaf_info *li = find_leaf_info(l, i);
+		//struct leaf_info *li = find_leaf_info(l, i);
+		struct list_head *fa_head = get_fa_head(l, i);
 		struct fib_alias *fa;
 		__be32 mask, prefix;
 
-		if (!li)
+		if (!fa_head)
 			continue;
 
-		mask = inet_make_mask(li->plen);
+		mask = inet_make_mask(i);
 		prefix = htonl(l->key);
 
-		list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+		list_for_each_entry_rcu(fa, fa_head, fa_list) {
 			const struct fib_info *fi = fa->fa_info;
 			unsigned flags = fib_flag_trans(fa->fa_type, mask, fi);
 

^ permalink raw reply related

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-15 17:14 UTC (permalink / raw)
  To: Radoslaw Szkodzinski (AstralStorm); +Cc: Jarek Poplawski, netdev, linux-kernel
In-Reply-To: <20080115161705.0724250d@astralstorm.puszkin.org>

Radoslaw Szkodzinski (AstralStorm) wrote:
> On Tue, 15 Jan 2008 08:47:07 -0600
> "Chris Friesen" <cfriesen@nortel.com> wrote:

>>Some of our hardware is not supported on mainline, so we need per-kernel 
>>version patches to even bring up the blade.  The blades netboot via a 
>>jumbo-frame network, so kernel extensions are needed to handle setting 
>>the MTU before mounting the rootfs.

> Why? Can't you use a small initramfs to set it up?

Sure, if we changed our build system, engineered a suitable small 
userspace, etc.  At this point it's easier to maintain a small patch to 
the kernel dhcp parsing code so that we can specify mtu.

>>The blade in question uses CKRM 
>>which doesn't exist for newer kernels so the problem may simply be 
>>hidden by scheduling differences.

> Current spiritual successor is Ingo's realtime patchset I guess.

I think the group scheduling stuff for CFS is closer, but there are 
design and API differences that would require us to rework our system.

>>The userspace application uses other 
>>kernel features that are not in mainline (and likely some of them won't 
>>ever be in mainline--I know because I've tried).

> What would these be? Some proc or sysfs files that were removed or
> renamed?
> Maybe they can be worked around w/o changing the application at all, or
> very minor changes.

No, more than proc/sysfs.  Things like notification of process state 
change (think like SIGCHLD but sent to arbitrary processes), additional 
messaging protocol families, oom-killer protection, dirty page 
monitoring, backwards compatibility for "dcbz" on the ppc970, nested 
alternate signal stacks, and many others.  Between our kernel vendor's 
patches and our own, there are something like 600 patches applied to the 
mainline kernel.

> Also, be sure to check if there are pauses with other CPU hogs.

With the sctp hash patch applied we're now sitting with 45% cpu free on 
one cpu, and about 25% free on the other, and we're still seeing 
periodic bursts of rx fifo loss.  It's wierd.  Still trying to figure 
out what's going on.

Chris

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Eric Dumazet @ 2008-01-15 17:23 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Radoslaw Szkodzinski (AstralStorm), Jarek Poplawski, netdev,
	linux-kernel
In-Reply-To: <478CE9F1.8020409@nortel.com>

On Tue, 15 Jan 2008 11:14:25 -0600
"Chris Friesen" <cfriesen@nortel.com> wrote:

> Radoslaw Szkodzinski (AstralStorm) wrote:
> > On Tue, 15 Jan 2008 08:47:07 -0600
> > "Chris Friesen" <cfriesen@nortel.com> wrote:
> 
> >>Some of our hardware is not supported on mainline, so we need per-kernel 
> >>version patches to even bring up the blade.  The blades netboot via a 
> >>jumbo-frame network, so kernel extensions are needed to handle setting 
> >>the MTU before mounting the rootfs.
> 
> > Why? Can't you use a small initramfs to set it up?
> 
> Sure, if we changed our build system, engineered a suitable small 
> userspace, etc.  At this point it's easier to maintain a small patch to 
> the kernel dhcp parsing code so that we can specify mtu.
> 
> >>The blade in question uses CKRM 
> >>which doesn't exist for newer kernels so the problem may simply be 
> >>hidden by scheduling differences.
> 
> > Current spiritual successor is Ingo's realtime patchset I guess.
> 
> I think the group scheduling stuff for CFS is closer, but there are 
> design and API differences that would require us to rework our system.
> 
> >>The userspace application uses other 
> >>kernel features that are not in mainline (and likely some of them won't 
> >>ever be in mainline--I know because I've tried).
> 
> > What would these be? Some proc or sysfs files that were removed or
> > renamed?
> > Maybe they can be worked around w/o changing the application at all, or
> > very minor changes.
> 
> No, more than proc/sysfs.  Things like notification of process state 
> change (think like SIGCHLD but sent to arbitrary processes), additional 
> messaging protocol families, oom-killer protection, dirty page 
> monitoring, backwards compatibility for "dcbz" on the ppc970, nested 
> alternate signal stacks, and many others.  Between our kernel vendor's 
> patches and our own, there are something like 600 patches applied to the 
> mainline kernel.
> 
> > Also, be sure to check if there are pauses with other CPU hogs.
> 
> With the sctp hash patch applied we're now sitting with 45% cpu free on 
> one cpu, and about 25% free on the other, and we're still seeing 
> periodic bursts of rx fifo loss.  It's wierd.  Still trying to figure 
> out what's going on.

On old kernels, some softirq functions can hog cpu for quite long times.

NIC RX rings can be filled and eventually losses occur.

Recent kernels tried to move some time consuming task from softirq to workqueue.

Examples:

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=29b7e29480029c896e7782ce2a45843abeea2f7a

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=d90bf5a976793edfa88d3bb2393f0231eb8ce1e5

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=39c90ece7565f5c47110c2fa77409d7a9478bd5b

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=86bba269d08f0c545ae76c90b56727f65d62d57f

...

^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Eric Dumazet @ 2008-01-15 17:25 UTC (permalink / raw)
  To: Robert Olsson; +Cc: Stephen Hemminger, David Miller, robert.olsson, netdev
In-Reply-To: <18316.58111.211387.271534@robur.slu.se>

On Tue, 15 Jan 2008 17:44:47 +0100
Robert Olsson <Robert.Olsson@data.slu.se> wrote:

> 
> Stephen Hemminger writes:
> 
>  > Okay, I would rather see the leaf_info explicit inside the leaf, also
>  > your scheme probably breaks if I add two prefixes and then delete the first.
>  > Let me have a go at it.
> 
>  I took Eric's patch a bit further... 
>  
>  Support for delete and dump is needed before any testing at all
>  and maybe some macro for checking and setting FP-leaf's
> 
>  Cheers.
> 					--ro
> 
> 
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 6dab753..f5b276c 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -97,22 +97,32 @@ typedef unsigned int t_key;
>  #define IS_LEAF(n) (n->parent & T_LEAF)
>  
>  struct node {
> -	unsigned long parent;
> -	t_key key;
> +	unsigned long		parent;
> +	t_key			key;
>  };
>  
>  struct leaf {
> -	unsigned long parent;
> -	t_key key;
> -	struct hlist_head list;
> -	struct rcu_head rcu;
> +	unsigned long		parent;
> +	t_key			key;
> +	/*
> +	 * Because we often have only one info per leaf, we embedd one here
> +	 * to save some space and speedup lookups (sharing cache line)
> +	 * a sort of fastpatch leaf (FP-leaf) this indicated a negative of
> +	 * plen_iinfo.
> +	 * Note : not inside a structure so that we dont have holes on 64bit 
> +	 */
> +	int 			plen_iinfo;
> +	struct list_head	falh_iinfo;

yep, I renamed them to einfo_plen & einfo_falh

> +
> +	struct hlist_head	list;
> +	struct rcu_head		rcu;
>  };
>  
>  struct leaf_info {
> -	struct hlist_node hlist;
> -	struct rcu_head rcu;
> -	int plen;
> -	struct list_head falh;
> +	struct hlist_node	hlist;
> +	int			plen;
> +	struct list_head	falh;
> +	struct rcu_head		rcu;
>  };
>  
>  struct tnode {
> @@ -364,11 +374,13 @@ static inline void tnode_free(struct tnode *tn)
>  		call_rcu(&tn->rcu, __tnode_free_rcu);
>  }
>  
> -static struct leaf *leaf_new(void)
> +static struct leaf *leaf_new(int plen)
>  {
>  	struct leaf *l = kmalloc(sizeof(struct leaf),  GFP_KERNEL);
>  	if (l) {
>  		l->parent = T_LEAF;
> +		l->plen_iinfo = plen;
> +		INIT_LIST_HEAD(&l->falh_iinfo);
>  		INIT_HLIST_HEAD(&l->list);
>  	}
>  	return l;
> @@ -890,7 +902,16 @@ static struct leaf_info *find_leaf_info(struct leaf *l, int plen)
>  
>  static inline struct list_head * get_fa_head(struct leaf *l, int plen)
>  {
> -	struct leaf_info *li = find_leaf_info(l, plen);
> +	struct leaf_info *li;
> +
> +	if (l->plen_iinfo >= 0) {
> +		if(l->plen_iinfo == plen)
> +			return &l->falh_iinfo;
> +		else
> +			return NULL;
> +	}

So you think that a leaf cannot have 2 infos, one 'embeded' and one in the list ?



^ permalink raw reply

* Re: [PATCH] net: EMAC: Fix problem with mtu > 4080 on non TAH equipped 4xx PPC's
From: Eugene Surovegin @ 2008-01-15 17:32 UTC (permalink / raw)
  To: Stefan Roese; +Cc: linuxppc-dev, netdev, benh
In-Reply-To: <1200400809-19720-1-git-send-email-sr@denx.de>

On Tue, Jan 15, 2008 at 01:40:09PM +0100, Stefan Roese wrote:
> Currently, all non TAH equipped 4xx PPC's call emac_start_xmit() upon
> xmit. This routine doesn't check if the frame length exceeds the max.
> MAL buffer size.
> 
> This patch now changes the driver to call emac_start_xmit_sg() on all
> platforms and not only the TAH equipped ones (440GX). This enables an
> MTU of 9000 instead 4080.
> 
> Tested on Kilauea (405EX) with gbit link -> jumbo frames enabled.
> 
> Signed-off-by: Stefan Roese <sr@denx.de>
> ---
> Eugene & Ben, do you see any problems with this patch? If not, then I'll
> send another version for the newemac driver too.

Hmm, so why not make GigE support a condition to hook SG version of 
xmit then? I don't like when you change behaviour for chips where it 
perefectly legal not to do this check because you cannot change MTU 
anyways.

-- 
Eugene


^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Stephen Hemminger @ 2008-01-15 17:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Robert Olsson, David Miller, robert.olsson, netdev
In-Reply-To: <20080115182544.98c18d08.dada1@cosmosbay.com>

This is how I did it:

--- a/net/ipv4/fib_trie.c	2008-01-15 09:14:53.000000000 -0800
+++ b/net/ipv4/fib_trie.c	2008-01-15 09:21:48.000000000 -0800
@@ -101,13 +101,6 @@ struct node {
 	t_key key;
 };
 
-struct leaf {
-	unsigned long parent;
-	t_key key;
-	struct hlist_head list;
-	struct rcu_head rcu;
-};
-
 struct leaf_info {
 	struct hlist_node hlist;
 	struct rcu_head rcu;
@@ -115,6 +108,13 @@ struct leaf_info {
 	struct list_head falh;
 };
 
+struct leaf {
+	unsigned long parent;
+	t_key key;
+	struct hlist_head list;
+	struct rcu_head rcu;
+};
+
 struct tnode {
 	unsigned long parent;
 	t_key key;
@@ -321,16 +321,6 @@ static void __leaf_free_rcu(struct rcu_h
 	kmem_cache_free(trie_leaf_kmem, leaf);
 }
 
-static void __leaf_info_free_rcu(struct rcu_head *head)
-{
-	kfree(container_of(head, struct leaf_info, rcu));
-}
-
-static inline void free_leaf_info(struct leaf_info *leaf)
-{
-	call_rcu(&leaf->rcu, __leaf_info_free_rcu);
-}
-
 static struct tnode *tnode_alloc(size_t size)
 {
 	struct page *pages;
@@ -357,7 +347,7 @@ static void __tnode_free_rcu(struct rcu_
 		free_pages((unsigned long)tn, get_order(size));
 }
 
-static inline void tnode_free(struct tnode *tn)
+static void tnode_free(struct tnode *tn)
 {
 	if (IS_LEAF(tn)) {
 		struct leaf *l = (struct leaf *) tn;
@@ -376,16 +366,41 @@ static struct leaf *leaf_new(void)
 	return l;
 }
 
+static void leaf_info_init(struct leaf_info *li, int plen)
+{
+	li->plen = plen;
+	INIT_LIST_HEAD(&li->falh);
+}
+
+static struct leaf_info *leaf_info_first(struct leaf *l, int plen)
+{
+	struct leaf_info *li = (struct leaf_info *) (l + 1);
+	leaf_info_init(li, plen);
+	return li;
+}
+
 static struct leaf_info *leaf_info_new(int plen)
 {
 	struct leaf_info *li = kmalloc(sizeof(struct leaf_info),  GFP_KERNEL);
-	if (li) {
-		li->plen = plen;
-		INIT_LIST_HEAD(&li->falh);
-	}
+	if (li)
+		leaf_info_init(li, plen);
+
 	return li;
 }
 
+static void __leaf_info_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct leaf_info, rcu));
+}
+
+static inline void free_leaf_info(struct leaf *l, struct leaf_info *leaf)
+{
+	if (leaf == (struct leaf_info *)(l + 1))
+		return;
+
+	call_rcu(&leaf->rcu, __leaf_info_free_rcu);
+}
+
 static struct tnode* tnode_new(t_key key, int pos, int bits)
 {
 	size_t sz = sizeof(struct tnode) + (sizeof(struct node *) << bits);
@@ -1047,18 +1062,13 @@ static struct list_head *fib_insert_node
 		insert_leaf_info(&l->list, li);
 		goto done;
 	}
-	l = leaf_new();
 
+	l = leaf_new();
 	if (!l)
 		return NULL;
 
 	l->key = key;
-	li = leaf_info_new(plen);
-
-	if (!li) {
-		tnode_free((struct tnode *) l);
-		return NULL;
-	}
+	li = leaf_info_first(l, plen);
 
 	fa_head = &li->falh;
 	insert_leaf_info(&l->list, li);
@@ -1091,7 +1101,7 @@ static struct list_head *fib_insert_node
 		}
 
 		if (!tn) {
-			free_leaf_info(li);
+			free_leaf_info(l, li);
 			tnode_free((struct tnode *) l);
 			return NULL;
 		}
@@ -1624,7 +1634,7 @@ static int fn_trie_delete(struct fib_tab
 
 	if (list_empty(fa_head)) {
 		hlist_del_rcu(&li->hlist);
-		free_leaf_info(li);
+		free_leaf_info(l, li);
 	}
 
 	if (hlist_empty(&l->list))
@@ -1668,7 +1678,7 @@ static int trie_flush_leaf(struct trie *
 
 		if (list_empty(&li->falh)) {
 			hlist_del_rcu(&li->hlist);
-			free_leaf_info(li);
+			free_leaf_info(l, li);
 		}
 	}
 	return found;
@@ -1935,7 +1945,8 @@ void __init fib_hash_init(void)
 	fn_alias_kmem = kmem_cache_create("ip_fib_alias", sizeof(struct fib_alias),
 					  0, SLAB_PANIC, NULL);
 
-	trie_leaf_kmem = kmem_cache_create("ip_fib_trie", sizeof(struct leaf),
+	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
+					   sizeof(struct leaf) + sizeof(struct leaf_info),
 					   0, SLAB_PANIC, NULL);
 }
 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox