Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: EMAC: Fix problem with mtu > 4080 on non TAH  equipped 4xx PPC's
From: Stefan Roese @ 2008-01-15 19:46 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, netdev, benh
In-Reply-To: <20080115173202.GA1268@gate.ebshome.net>

On Tuesday 15 January 2008, Eugene Surovegin wrote:
> On Tue, Jan 15, 2008 at 01:40:09PM +0100, Stefan Roese wrote:
> > Currently, all non TAH equipped 4xx PPC's call emac_start_xmit() upon
> > xmit. This routine doesn't check if the frame length exceeds the max.
> > MAL buffer size.
> >
> > This patch now changes the driver to call emac_start_xmit_sg() on all
> > platforms and not only the TAH equipped ones (440GX). This enables an
> > MTU of 9000 instead 4080.
> >
> > Tested on Kilauea (405EX) with gbit link -> jumbo frames enabled.
> >
> > Signed-off-by: Stefan Roese <sr@denx.de>
> > ---
> > Eugene & Ben, do you see any problems with this patch? If not, then I'll
> > send another version for the newemac driver too.
>
> Hmm, so why not make GigE support a condition to hook SG version of
> xmit then? I don't like when you change behaviour for chips where it
> perefectly legal not to do this check because you cannot change MTU
> anyways.

OK. But how do we detect GigE support? Seems like GigE enabled devices have 
CONFIG_IBM_EMAC4 defined. If nobody objects I'll fix up another version 
tomorrow.

Thanks.

Best regards,
Stefan

^ permalink raw reply

* Re: sky2 patch in 2.6.24-rc7-git6 breaks POST - commit 84cd2dfb04d23a961c5f537baa243fa54d0987ac
From: Stephen Hemminger @ 2008-01-15 19:16 UTC (permalink / raw)
  To: Ioan Ionita; +Cc: netdev, jeff, Linux Kernel Mailing List
In-Reply-To: <df47b87a0801151100qe7c39f1y3296176f343667eb@mail.gmail.com>

On Tue, 15 Jan 2008 21:00:13 +0200
"Ioan Ionita" <opslynx@gmail.com> wrote:

> On Jan 15, 2008 8:09 PM, Stephen Hemminger <stephen.hemminger@vyatta.com> wrote:
> > On Tue, 15 Jan 2008 13:01:47 +0200
> > "Ioan Ionita" <opslynx@gmail.com> wrote:
> >
> 
> >
> > Is Wake On Lan enabled in the BIOS?
> 
> I don't have a Wake on Lan option in BIOS, only wake on pci, i assume
> it's the same.  It was disabled. I enabled it and the regression
> behaved the same way.  Wouldn't POST after shutdown or soft reset.
> 

I assume if you do:
	ethtool -s eth0 wol d

it will POST okay.  Okay, for now I'll have Jeff revert the patch.
Looks like the BIOS in your system is broken for WOL.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: sky2 patch in 2.6.24-rc7-git6 breaks POST - commit 84cd2dfb04d23a961c5f537baa243fa54d0987ac
From: Ioan Ionita @ 2008-01-15 19:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, jeff, Linux Kernel Mailing List
In-Reply-To: <20080115100947.4ec6cda0@deepthought>

On Jan 15, 2008 8:09 PM, Stephen Hemminger <stephen.hemminger@vyatta.com> wrote:
> On Tue, 15 Jan 2008 13:01:47 +0200
> "Ioan Ionita" <opslynx@gmail.com> wrote:
>

>
> Is Wake On Lan enabled in the BIOS?

I don't have a Wake on Lan option in BIOS, only wake on pci, i assume
it's the same.  It was disabled. I enabled it and the regression
behaved the same way.  Wouldn't POST after shutdown or soft reset.

Regards

^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Eric Dumazet @ 2008-01-15 18:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Robert Olsson, David Miller, robert.olsson, netdev
In-Reply-To: <20080115101543.44c29fa7@deepthought>

On Tue, 15 Jan 2008 10:15:43 -0800
Stephen Hemminger <stephen.hemminger@vyatta.com> wrote:

> On Tue, 15 Jan 2008 19:10:31 +0100
> Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
> > On Tue, 15 Jan 2008 09:47:53 -0800
> > Stephen Hemminger <stephen.hemminger@vyatta.com> wrote:
> > 
> > > This is how I did it:
> > > 
> > > --- a/net/ipv4/fib_trie.c	2008-01-15 09:14:53.000000000 -0800
> > > +++ b/net/ipv4/fib_trie.c	2008-01-15 09:21:48.000000000 -0800
> > > @@ -101,13 +101,6 @@ struct node {
> > >  	t_key key;
> > >  };
> > >  
> > > -struct leaf {
> > > -	unsigned long parent;
> > > -	t_key key;
> > > -	struct hlist_head list;
> > > -	struct rcu_head rcu;
> > > -};
> > > -
> > >  struct leaf_info {
> > >  	struct hlist_node hlist;
> > >  	struct rcu_head rcu;
> > > @@ -115,6 +108,13 @@ struct leaf_info {
> > >  	struct list_head falh;
> > >  };
> > >  
> > > +struct leaf {
> > > +	unsigned long parent;
> > > +	t_key key;
> > > +	struct hlist_head list;
> > > +	struct rcu_head rcu;
> > > +};
> > 
> > I like this :)
> > 
> > Your design is clean, but we waste some space (rcu in leaf_info "included"), we probably can do a litle bit better
> > (moving rcu at the end of leaf_info, and kmem_cache_create("ip_fib_trie", sizeof(struct leaf) + sizeof(struct_leaf_info) - sizeof(struct rcu_head))
> > 
> >  
> > > -	trie_leaf_kmem = kmem_cache_create("ip_fib_trie", sizeof(struct leaf),
> > > +	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
> > > +					   sizeof(struct leaf) + sizeof(struct leaf_info),
> > >  					   0, SLAB_PANIC, NULL);
> > >  }
> > >  
> > > 
> > 
> > Thank you
> 
> Having multiple RCU links is a waste. I started on code that just splice's the
> leaf_info's off to a free_list and then do a mass free after and RCU barrier.
> 
> For the normal case of just freeing a leaf, it could just walk the chain
> in the RCU free of the leaf.

Well, If you take this path, we can also copy the leaf itself, and
just use a variable size array of infos[], no more list, and maximal locality

kmalloc(sizeof(leaf) + nb_infos * sizeof(info))

(Still we can use a kmem cache for nb_infos=1 leaves)


^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Stephen Hemminger @ 2008-01-15 18:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Robert Olsson, David Miller, robert.olsson, netdev
In-Reply-To: <20080115191031.7ce7219b.dada1@cosmosbay.com>

On Tue, 15 Jan 2008 19:10:31 +0100
Eric Dumazet <dada1@cosmosbay.com> wrote:

> On Tue, 15 Jan 2008 09:47:53 -0800
> Stephen Hemminger <stephen.hemminger@vyatta.com> wrote:
> 
> > This is how I did it:
> > 
> > --- a/net/ipv4/fib_trie.c	2008-01-15 09:14:53.000000000 -0800
> > +++ b/net/ipv4/fib_trie.c	2008-01-15 09:21:48.000000000 -0800
> > @@ -101,13 +101,6 @@ struct node {
> >  	t_key key;
> >  };
> >  
> > -struct leaf {
> > -	unsigned long parent;
> > -	t_key key;
> > -	struct hlist_head list;
> > -	struct rcu_head rcu;
> > -};
> > -
> >  struct leaf_info {
> >  	struct hlist_node hlist;
> >  	struct rcu_head rcu;
> > @@ -115,6 +108,13 @@ struct leaf_info {
> >  	struct list_head falh;
> >  };
> >  
> > +struct leaf {
> > +	unsigned long parent;
> > +	t_key key;
> > +	struct hlist_head list;
> > +	struct rcu_head rcu;
> > +};
> 
> I like this :)
> 
> Your design is clean, but we waste some space (rcu in leaf_info "included"), we probably can do a litle bit better
> (moving rcu at the end of leaf_info, and kmem_cache_create("ip_fib_trie", sizeof(struct leaf) + sizeof(struct_leaf_info) - sizeof(struct rcu_head))
> 
>  
> > -	trie_leaf_kmem = kmem_cache_create("ip_fib_trie", sizeof(struct leaf),
> > +	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
> > +					   sizeof(struct leaf) + sizeof(struct leaf_info),
> >  					   0, SLAB_PANIC, NULL);
> >  }
> >  
> > 
> 
> Thank you

Having multiple RCU links is a waste. I started on code that just splice's the
leaf_info's off to a free_list and then do a mass free after and RCU barrier.

For the normal case of just freeing a leaf, it could just walk the chain
in the RCU free of the leaf.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: sky2 patch in 2.6.24-rc7-git6 breaks POST - commit 84cd2dfb04d23a961c5f537baa243fa54d0987ac
From: Stephen Hemminger @ 2008-01-15 18:09 UTC (permalink / raw)
  To: Ioan Ionita; +Cc: netdev, jeff, Linux Kernel Mailing List
In-Reply-To: <df47b87a0801150301w6b970decwdd4bd256aa2ca9ac@mail.gmail.com>

On Tue, 15 Jan 2008 13:01:47 +0200
"Ioan Ionita" <opslynx@gmail.com> wrote:

> Hi,
> 
> I have an Asus Commando motherboard, p965 chipset with Marvell
> 88E8056 and 88E8001 gigabit lan onboard. skge and sky2 are compiled
> in.
> When booting linux-2.6.24-rc7-git6, everything seemed fine, but when
> attempting to reboot, the machine would freeze on POST, right before
> detecting AHCI drives.
> In order to get it to POST, a reset or shutdown from the power button
> is not enough, I actually have to cut power to the motherboard using
> the PSU's power switch!
> 
> I first thought that the BIOS or CMOS may have gotten corrupted, but
> eventually I fund that linux was the culprit. After a git bisect, I
> found the bad commit below
> 

Is Wake On Lan enabled in the BIOS?

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Eric Dumazet @ 2008-01-15 18:10 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Robert Olsson, David Miller, robert.olsson, netdev
In-Reply-To: <20080115094753.32e35823@deepthought>

On Tue, 15 Jan 2008 09:47:53 -0800
Stephen Hemminger <stephen.hemminger@vyatta.com> wrote:

> This is how I did it:
> 
> --- a/net/ipv4/fib_trie.c	2008-01-15 09:14:53.000000000 -0800
> +++ b/net/ipv4/fib_trie.c	2008-01-15 09:21:48.000000000 -0800
> @@ -101,13 +101,6 @@ struct node {
>  	t_key key;
>  };
>  
> -struct leaf {
> -	unsigned long parent;
> -	t_key key;
> -	struct hlist_head list;
> -	struct rcu_head rcu;
> -};
> -
>  struct leaf_info {
>  	struct hlist_node hlist;
>  	struct rcu_head rcu;
> @@ -115,6 +108,13 @@ struct leaf_info {
>  	struct list_head falh;
>  };
>  
> +struct leaf {
> +	unsigned long parent;
> +	t_key key;
> +	struct hlist_head list;
> +	struct rcu_head rcu;
> +};

I like this :)

Your design is clean, but we waste some space (rcu in leaf_info "included"), we probably can do a litle bit better
(moving rcu at the end of leaf_info, and kmem_cache_create("ip_fib_trie", sizeof(struct leaf) + sizeof(struct_leaf_info) - sizeof(struct rcu_head))

 
> -	trie_leaf_kmem = kmem_cache_create("ip_fib_trie", sizeof(struct leaf),
> +	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
> +					   sizeof(struct leaf) + sizeof(struct leaf_info),
>  					   0, SLAB_PANIC, NULL);
>  }
>  
> 

Thank you

^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Robert Olsson @ 2008-01-15 17:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Robert Olsson, Stephen Hemminger, David Miller, robert.olsson,
	netdev
In-Reply-To: <20080115182544.98c18d08.dada1@cosmosbay.com>


Eric Dumazet writes:

 > 
 > So you think that a leaf cannot have 2 infos, one 'embeded' and one in the list ?
 
 Hello,
 
 The model I thought of is to have either: 

 1) One leaf_info embedded in leaf. A fast-path leaf. FP-leaf
 Or 
 2) The intct old leaf_info list with arbitrary number leaf_infos

 If plen_iinfo is >=0  It's a FP-leaf
 
 Cheers.
					--ro


^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Stephen Hemminger @ 2008-01-15 17:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Robert Olsson, David Miller, robert.olsson, netdev
In-Reply-To: <20080115182544.98c18d08.dada1@cosmosbay.com>

This is how I did it:

--- a/net/ipv4/fib_trie.c	2008-01-15 09:14:53.000000000 -0800
+++ b/net/ipv4/fib_trie.c	2008-01-15 09:21:48.000000000 -0800
@@ -101,13 +101,6 @@ struct node {
 	t_key key;
 };
 
-struct leaf {
-	unsigned long parent;
-	t_key key;
-	struct hlist_head list;
-	struct rcu_head rcu;
-};
-
 struct leaf_info {
 	struct hlist_node hlist;
 	struct rcu_head rcu;
@@ -115,6 +108,13 @@ struct leaf_info {
 	struct list_head falh;
 };
 
+struct leaf {
+	unsigned long parent;
+	t_key key;
+	struct hlist_head list;
+	struct rcu_head rcu;
+};
+
 struct tnode {
 	unsigned long parent;
 	t_key key;
@@ -321,16 +321,6 @@ static void __leaf_free_rcu(struct rcu_h
 	kmem_cache_free(trie_leaf_kmem, leaf);
 }
 
-static void __leaf_info_free_rcu(struct rcu_head *head)
-{
-	kfree(container_of(head, struct leaf_info, rcu));
-}
-
-static inline void free_leaf_info(struct leaf_info *leaf)
-{
-	call_rcu(&leaf->rcu, __leaf_info_free_rcu);
-}
-
 static struct tnode *tnode_alloc(size_t size)
 {
 	struct page *pages;
@@ -357,7 +347,7 @@ static void __tnode_free_rcu(struct rcu_
 		free_pages((unsigned long)tn, get_order(size));
 }
 
-static inline void tnode_free(struct tnode *tn)
+static void tnode_free(struct tnode *tn)
 {
 	if (IS_LEAF(tn)) {
 		struct leaf *l = (struct leaf *) tn;
@@ -376,16 +366,41 @@ static struct leaf *leaf_new(void)
 	return l;
 }
 
+static void leaf_info_init(struct leaf_info *li, int plen)
+{
+	li->plen = plen;
+	INIT_LIST_HEAD(&li->falh);
+}
+
+static struct leaf_info *leaf_info_first(struct leaf *l, int plen)
+{
+	struct leaf_info *li = (struct leaf_info *) (l + 1);
+	leaf_info_init(li, plen);
+	return li;
+}
+
 static struct leaf_info *leaf_info_new(int plen)
 {
 	struct leaf_info *li = kmalloc(sizeof(struct leaf_info),  GFP_KERNEL);
-	if (li) {
-		li->plen = plen;
-		INIT_LIST_HEAD(&li->falh);
-	}
+	if (li)
+		leaf_info_init(li, plen);
+
 	return li;
 }
 
+static void __leaf_info_free_rcu(struct rcu_head *head)
+{
+	kfree(container_of(head, struct leaf_info, rcu));
+}
+
+static inline void free_leaf_info(struct leaf *l, struct leaf_info *leaf)
+{
+	if (leaf == (struct leaf_info *)(l + 1))
+		return;
+
+	call_rcu(&leaf->rcu, __leaf_info_free_rcu);
+}
+
 static struct tnode* tnode_new(t_key key, int pos, int bits)
 {
 	size_t sz = sizeof(struct tnode) + (sizeof(struct node *) << bits);
@@ -1047,18 +1062,13 @@ static struct list_head *fib_insert_node
 		insert_leaf_info(&l->list, li);
 		goto done;
 	}
-	l = leaf_new();
 
+	l = leaf_new();
 	if (!l)
 		return NULL;
 
 	l->key = key;
-	li = leaf_info_new(plen);
-
-	if (!li) {
-		tnode_free((struct tnode *) l);
-		return NULL;
-	}
+	li = leaf_info_first(l, plen);
 
 	fa_head = &li->falh;
 	insert_leaf_info(&l->list, li);
@@ -1091,7 +1101,7 @@ static struct list_head *fib_insert_node
 		}
 
 		if (!tn) {
-			free_leaf_info(li);
+			free_leaf_info(l, li);
 			tnode_free((struct tnode *) l);
 			return NULL;
 		}
@@ -1624,7 +1634,7 @@ static int fn_trie_delete(struct fib_tab
 
 	if (list_empty(fa_head)) {
 		hlist_del_rcu(&li->hlist);
-		free_leaf_info(li);
+		free_leaf_info(l, li);
 	}
 
 	if (hlist_empty(&l->list))
@@ -1668,7 +1678,7 @@ static int trie_flush_leaf(struct trie *
 
 		if (list_empty(&li->falh)) {
 			hlist_del_rcu(&li->hlist);
-			free_leaf_info(li);
+			free_leaf_info(l, li);
 		}
 	}
 	return found;
@@ -1935,7 +1945,8 @@ void __init fib_hash_init(void)
 	fn_alias_kmem = kmem_cache_create("ip_fib_alias", sizeof(struct fib_alias),
 					  0, SLAB_PANIC, NULL);
 
-	trie_leaf_kmem = kmem_cache_create("ip_fib_trie", sizeof(struct leaf),
+	trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
+					   sizeof(struct leaf) + sizeof(struct leaf_info),
 					   0, SLAB_PANIC, NULL);
 }
 

^ permalink raw reply

* Re: [PATCH] net: EMAC: Fix problem with mtu > 4080 on non TAH equipped 4xx PPC's
From: Eugene Surovegin @ 2008-01-15 17:32 UTC (permalink / raw)
  To: Stefan Roese; +Cc: linuxppc-dev, netdev, benh
In-Reply-To: <1200400809-19720-1-git-send-email-sr@denx.de>

On Tue, Jan 15, 2008 at 01:40:09PM +0100, Stefan Roese wrote:
> Currently, all non TAH equipped 4xx PPC's call emac_start_xmit() upon
> xmit. This routine doesn't check if the frame length exceeds the max.
> MAL buffer size.
> 
> This patch now changes the driver to call emac_start_xmit_sg() on all
> platforms and not only the TAH equipped ones (440GX). This enables an
> MTU of 9000 instead 4080.
> 
> Tested on Kilauea (405EX) with gbit link -> jumbo frames enabled.
> 
> Signed-off-by: Stefan Roese <sr@denx.de>
> ---
> Eugene & Ben, do you see any problems with this patch? If not, then I'll
> send another version for the newemac driver too.

Hmm, so why not make GigE support a condition to hook SG version of 
xmit then? I don't like when you change behaviour for chips where it 
perefectly legal not to do this check because you cannot change MTU 
anyways.

-- 
Eugene


^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Eric Dumazet @ 2008-01-15 17:25 UTC (permalink / raw)
  To: Robert Olsson; +Cc: Stephen Hemminger, David Miller, robert.olsson, netdev
In-Reply-To: <18316.58111.211387.271534@robur.slu.se>

On Tue, 15 Jan 2008 17:44:47 +0100
Robert Olsson <Robert.Olsson@data.slu.se> wrote:

> 
> Stephen Hemminger writes:
> 
>  > Okay, I would rather see the leaf_info explicit inside the leaf, also
>  > your scheme probably breaks if I add two prefixes and then delete the first.
>  > Let me have a go at it.
> 
>  I took Eric's patch a bit further... 
>  
>  Support for delete and dump is needed before any testing at all
>  and maybe some macro for checking and setting FP-leaf's
> 
>  Cheers.
> 					--ro
> 
> 
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 6dab753..f5b276c 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -97,22 +97,32 @@ typedef unsigned int t_key;
>  #define IS_LEAF(n) (n->parent & T_LEAF)
>  
>  struct node {
> -	unsigned long parent;
> -	t_key key;
> +	unsigned long		parent;
> +	t_key			key;
>  };
>  
>  struct leaf {
> -	unsigned long parent;
> -	t_key key;
> -	struct hlist_head list;
> -	struct rcu_head rcu;
> +	unsigned long		parent;
> +	t_key			key;
> +	/*
> +	 * Because we often have only one info per leaf, we embedd one here
> +	 * to save some space and speedup lookups (sharing cache line)
> +	 * a sort of fastpatch leaf (FP-leaf) this indicated a negative of
> +	 * plen_iinfo.
> +	 * Note : not inside a structure so that we dont have holes on 64bit 
> +	 */
> +	int 			plen_iinfo;
> +	struct list_head	falh_iinfo;

yep, I renamed them to einfo_plen & einfo_falh

> +
> +	struct hlist_head	list;
> +	struct rcu_head		rcu;
>  };
>  
>  struct leaf_info {
> -	struct hlist_node hlist;
> -	struct rcu_head rcu;
> -	int plen;
> -	struct list_head falh;
> +	struct hlist_node	hlist;
> +	int			plen;
> +	struct list_head	falh;
> +	struct rcu_head		rcu;
>  };
>  
>  struct tnode {
> @@ -364,11 +374,13 @@ static inline void tnode_free(struct tnode *tn)
>  		call_rcu(&tn->rcu, __tnode_free_rcu);
>  }
>  
> -static struct leaf *leaf_new(void)
> +static struct leaf *leaf_new(int plen)
>  {
>  	struct leaf *l = kmalloc(sizeof(struct leaf),  GFP_KERNEL);
>  	if (l) {
>  		l->parent = T_LEAF;
> +		l->plen_iinfo = plen;
> +		INIT_LIST_HEAD(&l->falh_iinfo);
>  		INIT_HLIST_HEAD(&l->list);
>  	}
>  	return l;
> @@ -890,7 +902,16 @@ static struct leaf_info *find_leaf_info(struct leaf *l, int plen)
>  
>  static inline struct list_head * get_fa_head(struct leaf *l, int plen)
>  {
> -	struct leaf_info *li = find_leaf_info(l, plen);
> +	struct leaf_info *li;
> +
> +	if (l->plen_iinfo >= 0) {
> +		if(l->plen_iinfo == plen)
> +			return &l->falh_iinfo;
> +		else
> +			return NULL;
> +	}

So you think that a leaf cannot have 2 infos, one 'embeded' and one in the list ?



^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Eric Dumazet @ 2008-01-15 17:23 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Radoslaw Szkodzinski (AstralStorm), Jarek Poplawski, netdev,
	linux-kernel
In-Reply-To: <478CE9F1.8020409@nortel.com>

On Tue, 15 Jan 2008 11:14:25 -0600
"Chris Friesen" <cfriesen@nortel.com> wrote:

> Radoslaw Szkodzinski (AstralStorm) wrote:
> > On Tue, 15 Jan 2008 08:47:07 -0600
> > "Chris Friesen" <cfriesen@nortel.com> wrote:
> 
> >>Some of our hardware is not supported on mainline, so we need per-kernel 
> >>version patches to even bring up the blade.  The blades netboot via a 
> >>jumbo-frame network, so kernel extensions are needed to handle setting 
> >>the MTU before mounting the rootfs.
> 
> > Why? Can't you use a small initramfs to set it up?
> 
> Sure, if we changed our build system, engineered a suitable small 
> userspace, etc.  At this point it's easier to maintain a small patch to 
> the kernel dhcp parsing code so that we can specify mtu.
> 
> >>The blade in question uses CKRM 
> >>which doesn't exist for newer kernels so the problem may simply be 
> >>hidden by scheduling differences.
> 
> > Current spiritual successor is Ingo's realtime patchset I guess.
> 
> I think the group scheduling stuff for CFS is closer, but there are 
> design and API differences that would require us to rework our system.
> 
> >>The userspace application uses other 
> >>kernel features that are not in mainline (and likely some of them won't 
> >>ever be in mainline--I know because I've tried).
> 
> > What would these be? Some proc or sysfs files that were removed or
> > renamed?
> > Maybe they can be worked around w/o changing the application at all, or
> > very minor changes.
> 
> No, more than proc/sysfs.  Things like notification of process state 
> change (think like SIGCHLD but sent to arbitrary processes), additional 
> messaging protocol families, oom-killer protection, dirty page 
> monitoring, backwards compatibility for "dcbz" on the ppc970, nested 
> alternate signal stacks, and many others.  Between our kernel vendor's 
> patches and our own, there are something like 600 patches applied to the 
> mainline kernel.
> 
> > Also, be sure to check if there are pauses with other CPU hogs.
> 
> With the sctp hash patch applied we're now sitting with 45% cpu free on 
> one cpu, and about 25% free on the other, and we're still seeing 
> periodic bursts of rx fifo loss.  It's wierd.  Still trying to figure 
> out what's going on.

On old kernels, some softirq functions can hog cpu for quite long times.

NIC RX rings can be filled and eventually losses occur.

Recent kernels tried to move some time consuming task from softirq to workqueue.

Examples:

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=29b7e29480029c896e7782ce2a45843abeea2f7a

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=d90bf5a976793edfa88d3bb2393f0231eb8ce1e5

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=39c90ece7565f5c47110c2fa77409d7a9478bd5b

http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.25.git;a=commit;h=86bba269d08f0c545ae76c90b56727f65d62d57f

...

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-15 17:14 UTC (permalink / raw)
  To: Radoslaw Szkodzinski (AstralStorm); +Cc: Jarek Poplawski, netdev, linux-kernel
In-Reply-To: <20080115161705.0724250d@astralstorm.puszkin.org>

Radoslaw Szkodzinski (AstralStorm) wrote:
> On Tue, 15 Jan 2008 08:47:07 -0600
> "Chris Friesen" <cfriesen@nortel.com> wrote:

>>Some of our hardware is not supported on mainline, so we need per-kernel 
>>version patches to even bring up the blade.  The blades netboot via a 
>>jumbo-frame network, so kernel extensions are needed to handle setting 
>>the MTU before mounting the rootfs.

> Why? Can't you use a small initramfs to set it up?

Sure, if we changed our build system, engineered a suitable small 
userspace, etc.  At this point it's easier to maintain a small patch to 
the kernel dhcp parsing code so that we can specify mtu.

>>The blade in question uses CKRM 
>>which doesn't exist for newer kernels so the problem may simply be 
>>hidden by scheduling differences.

> Current spiritual successor is Ingo's realtime patchset I guess.

I think the group scheduling stuff for CFS is closer, but there are 
design and API differences that would require us to rework our system.

>>The userspace application uses other 
>>kernel features that are not in mainline (and likely some of them won't 
>>ever be in mainline--I know because I've tried).

> What would these be? Some proc or sysfs files that were removed or
> renamed?
> Maybe they can be worked around w/o changing the application at all, or
> very minor changes.

No, more than proc/sysfs.  Things like notification of process state 
change (think like SIGCHLD but sent to arbitrary processes), additional 
messaging protocol families, oom-killer protection, dirty page 
monitoring, backwards compatibility for "dcbz" on the ppc970, nested 
alternate signal stacks, and many others.  Between our kernel vendor's 
patches and our own, there are something like 600 patches applied to the 
mainline kernel.

> Also, be sure to check if there are pauses with other CPU hogs.

With the sctp hash patch applied we're now sitting with 45% cpu free on 
one cpu, and about 25% free on the other, and we're still seeing 
periodic bursts of rx fifo loss.  It's wierd.  Still trying to figure 
out what's going on.

Chris

^ permalink raw reply

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Robert Olsson @ 2008-01-15 16:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, robert.olsson, netdev
In-Reply-To: <20080115081905.3b8b6f05@deepthought>


Stephen Hemminger writes:

 > Okay, I would rather see the leaf_info explicit inside the leaf, also
 > your scheme probably breaks if I add two prefixes and then delete the first.
 > Let me have a go at it.

 I took Eric's patch a bit further... 
 
 Support for delete and dump is needed before any testing at all
 and maybe some macro for checking and setting FP-leaf's

 Cheers.
					--ro


diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 6dab753..f5b276c 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -97,22 +97,32 @@ typedef unsigned int t_key;
 #define IS_LEAF(n) (n->parent & T_LEAF)
 
 struct node {
-	unsigned long parent;
-	t_key key;
+	unsigned long		parent;
+	t_key			key;
 };
 
 struct leaf {
-	unsigned long parent;
-	t_key key;
-	struct hlist_head list;
-	struct rcu_head rcu;
+	unsigned long		parent;
+	t_key			key;
+	/*
+	 * Because we often have only one info per leaf, we embedd one here
+	 * to save some space and speedup lookups (sharing cache line)
+	 * a sort of fastpatch leaf (FP-leaf) this indicated a negative of
+	 * plen_iinfo.
+	 * Note : not inside a structure so that we dont have holes on 64bit 
+	 */
+	int 			plen_iinfo;
+	struct list_head	falh_iinfo;
+
+	struct hlist_head	list;
+	struct rcu_head		rcu;
 };
 
 struct leaf_info {
-	struct hlist_node hlist;
-	struct rcu_head rcu;
-	int plen;
-	struct list_head falh;
+	struct hlist_node	hlist;
+	int			plen;
+	struct list_head	falh;
+	struct rcu_head		rcu;
 };
 
 struct tnode {
@@ -364,11 +374,13 @@ static inline void tnode_free(struct tnode *tn)
 		call_rcu(&tn->rcu, __tnode_free_rcu);
 }
 
-static struct leaf *leaf_new(void)
+static struct leaf *leaf_new(int plen)
 {
 	struct leaf *l = kmalloc(sizeof(struct leaf),  GFP_KERNEL);
 	if (l) {
 		l->parent = T_LEAF;
+		l->plen_iinfo = plen;
+		INIT_LIST_HEAD(&l->falh_iinfo);
 		INIT_HLIST_HEAD(&l->list);
 	}
 	return l;
@@ -890,7 +902,16 @@ static struct leaf_info *find_leaf_info(struct leaf *l, int plen)
 
 static inline struct list_head * get_fa_head(struct leaf *l, int plen)
 {
-	struct leaf_info *li = find_leaf_info(l, plen);
+	struct leaf_info *li;
+
+	if (l->plen_iinfo >= 0) {
+		if(l->plen_iinfo == plen)
+			return &l->falh_iinfo;
+		else
+			return NULL;
+	}
+
+	li = find_leaf_info(l, plen);
 
 	if (!li)
 		return NULL;
@@ -1036,6 +1057,21 @@ static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
 
 	if (n != NULL && IS_LEAF(n) && tkey_equals(key, n->key)) {
 		l = (struct leaf *) n;
+
+		/* 
+		 * Check if it is a FP-leaf if so we have to convert 
+		 * it to an ordinary leaf with leaf_infos
+		 */
+
+		if( l->plen_iinfo >= 0 ) {
+			li = leaf_info_new(l->plen_iinfo);
+			if (!li)
+				return NULL;
+
+			insert_leaf_info(&l->list, li);
+			l->plen_iinfo = -1;
+		}
+
 		li = leaf_info_new(plen);
 
 		if (!li)
@@ -1045,21 +1081,16 @@ static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
 		insert_leaf_info(&l->list, li);
 		goto done;
 	}
-	l = leaf_new();
+
+	/* Insert a FP-leaf */
+	l = leaf_new(plen);
 
 	if (!l)
 		return NULL;
 
 	l->key = key;
-	li = leaf_info_new(plen);
-
-	if (!li) {
-		tnode_free((struct tnode *) l);
-		return NULL;
-	}
+	fa_head = &l->falh_iinfo;
 
-	fa_head = &li->falh;
-	insert_leaf_info(&l->list, li);
 
 	if (t->trie && n == NULL) {
 		/* Case 2: n is NULL, and will just insert a new leaf */
@@ -1089,7 +1120,6 @@ static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
 		}
 
 		if (!tn) {
-			free_leaf_info(li);
 			tnode_free((struct tnode *) l);
 			return NULL;
 		}
@@ -1285,6 +1315,20 @@ static inline int check_leaf(struct trie *t, struct leaf *l,
 	struct hlist_head *hhead = &l->list;
 	struct hlist_node *node;
 
+	i = l->plen_iinfo;	
+	if( i >= 0) {   /* FP-leaf */
+		mask = inet_make_mask(i);
+		err = fib_semantic_match(&l->falh_iinfo, flp, res, htonl(l->key), mask, i);
+		if(err <= 0 ) {
+			*plen = i;
+#ifdef CONFIG_IP_FIB_TRIE_STATS
+			t->stats.semantic_match_passed++;
+#endif
+		return err;
+		}
+	}
+
+
 	hlist_for_each_entry_rcu(li, node, hhead, hlist) {
 		i = li->plen;
 		mask = inet_make_mask(i);
@@ -1614,7 +1658,7 @@ static int fn_trie_delete(struct fib_table *tb, struct fib_config *cfg)
 	li = find_leaf_info(l, plen);
 
 	list_del_rcu(&fa->fa_list);
-
+// FIXME
 	if (list_empty(fa_head)) {
 		hlist_del_rcu(&li->hlist);
 		free_leaf_info(li);
@@ -2336,12 +2380,11 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
 		seq_indent(seq, iter->depth);
 		seq_printf(seq, "  |-- %d.%d.%d.%d\n", NIPQUAD(val));
 		for (i = 32; i >= 0; i--) {
-			struct leaf_info *li = find_leaf_info(l, i);
+			struct list_head *fa_head = get_fa_head(l, i);
+			if (fa_head) {
 
-			if (li) {
 				struct fib_alias *fa;
-
-				list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+				list_for_each_entry_rcu(fa, fa_head, fa_list) {
 					char buf1[32], buf2[32];
 
 					seq_indent(seq, iter->depth+1);
@@ -2424,17 +2467,18 @@ static int fib_route_seq_show(struct seq_file *seq, void *v)
 		return 0;
 
 	for (i=32; i>=0; i--) {
-		struct leaf_info *li = find_leaf_info(l, i);
+		//struct leaf_info *li = find_leaf_info(l, i);
+		struct list_head *fa_head = get_fa_head(l, i);
 		struct fib_alias *fa;
 		__be32 mask, prefix;
 
-		if (!li)
+		if (!fa_head)
 			continue;
 
-		mask = inet_make_mask(li->plen);
+		mask = inet_make_mask(i);
 		prefix = htonl(l->key);
 
-		list_for_each_entry_rcu(fa, &li->falh, fa_list) {
+		list_for_each_entry_rcu(fa, fa_head, fa_list) {
 			const struct fib_info *fi = fa->fa_info;
 			unsigned flags = fib_flag_trans(fa->fa_type, mask, fi);
 

^ permalink raw reply related

* Re: [RFC 6/6] fib_trie: combine leaf and info
From: Stephen Hemminger @ 2008-01-15 16:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, robert.olsson, netdev
In-Reply-To: <478C4FBE.5040308@cosmosbay.com>

On Tue, 15 Jan 2008 07:16:30 +0100
Eric Dumazet <dada1@cosmosbay.com> wrote:

> Eric Dumazet a écrit :
> > Stephen Hemminger a écrit :
> >> Combine the prefix information and the leaf together into one
> >> allocation. This is furthur simplified by converting the hlist
> >> into a simple bitfield.
> >>
> >> Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
> >>
> >> This patch is experimental (ie it boots and loads routes), but
> >> is slower for the 163,000 route dump test.
> >>
> > 
> > Its also very memory expensive. That was not Robert suggestion I believe.
> > 
> > Its suggestion is to embedd one info into each leaves.
> > 
> > Please find attached to this mail a preliminary and incomplete patch I 
> > wrote this morning before coffee :), to get the idea.
> 
> Oops, patch reversed, sorry, and against another work pending in my tree.
> 
> Now time for cofee :)
> 

Okay, I would rather see the leaf_info explicit inside the leaf, also
your scheme probably breaks if I add two prefixes and then delete the first.
Let me have a go at it.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: Packetlost when "tc qdisc del dev eth0 root"
From: slavon @ 2008-01-15 16:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478CD9D6.3000504@trash.net>

I understand. Thanks!

> Badalian Vyacheslav wrote:
>> Hello all. Have packetlost when do "tc qdisc del dev eth0 root".
>>
>> look:
>>
>> slavon ~ # ping -f 87.255.1.134
>> PING 87.255.1.134 (87.255.1.134) 56(84) bytes of data.
>> .
>> .....................................................................................................................................................................................................................................................................   
>> .
>> --- 87.255.1.134 ping statistics ---
>> 60811 packets transmitted, 60544 received, 0% packet loss, time 39528ms
>> rtt min/avg/max/mdev = 0.203/0.579/3227.655/13.124 ms, pipe 219,   
>> ipg/ewma 0.650/2.197 ms
>>
>> Its normal?
>
> Yes, packets in the old qdisc are lost.
>
>> Maybe if tc do changes - need create second queue (hash of rules or  
>>  how you named it?) and do changes at it. Then replace old queue   
>> rules by created new.
>> Logic -
>> 1. Do snapshot
>> 2. Do changes in shapshot
>> 3. All new packets go to snapshot
>> 4. If old queue not have packets - delete it.
>> 5. Snapshot its default.
>
>
> That doesn't really work since qdiscs keep internal state that
> in large parts depends on the packets queued. Take the qlen as
> a simple example, the new qdisc doesn't know about the packets
> in the old one and will exceed the limit.



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply

* Re: Packetlost when "tc qdisc del dev eth0 root"
From: Patrick McHardy @ 2008-01-15 16:05 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev
In-Reply-To: <478C94B7.3070503@bigtelecom.ru>

Badalian Vyacheslav wrote:
> Hello all. Have packetlost when do "tc qdisc del dev eth0 root".
> 
> look:
> 
> slavon ~ # ping -f 87.255.1.134
> PING 87.255.1.134 (87.255.1.134) 56(84) bytes of data.
> .
> ..................................................................................................................................................................................................................................................................... 
> 
> .
> --- 87.255.1.134 ping statistics ---
> 60811 packets transmitted, 60544 received, 0% packet loss, time 39528ms
> rtt min/avg/max/mdev = 0.203/0.579/3227.655/13.124 ms, pipe 219, 
> ipg/ewma 0.650/2.197 ms
> 
> Its normal?

Yes, packets in the old qdisc are lost.

> Maybe if tc do changes - need create second queue (hash of rules or how 
> you named it?) and do changes at it. Then replace old queue rules by 
> created new.
> Logic -
> 1. Do snapshot
> 2. Do changes in shapshot
> 3. All new packets go to snapshot
> 4. If old queue not have packets - delete it.
> 5. Snapshot its default.


That doesn't really work since qdiscs keep internal state that
in large parts depends on the packets queued. Take the qlen as
a simple example, the new qdisc doesn't know about the packets
in the old one and will exceed the limit.

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: slavon @ 2008-01-15 16:04 UTC (permalink / raw)
  To: Frans Pop; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <200801151504.10047.elendil@planet.nl>

Quoting Frans Pop <elendil@planet.nl>:

> On Tuesday 15 January 2008, David Miller wrote:
>> From: Frans Pop <elendil@planet.nl>
>> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>>
>> Does this make the problem go away?
>
> Yes, it very much looks like that solves it.
> I ran with the patch for 6 hours or so without any errors. I then switched
> back to an unpatched kernel and they reappeared immediately.
>
>> (Note this isn't the final correct patch we should apply.  There
>>  is no reason why this revert back to the older ->poll() logic
>>  here should have any effect on the TX hang triggering...)
>
> s/no reason/no obvious reason/ ? ;-)
>
> Cheers,
> FJP
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

Hello.

I also try your patch (apply to 2.6.24-rc7-git2)

I catch this message in dmesg
[ 1771.796954] e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 1771.796957]   Tx Queue             <0>
[ 1771.796958]   TDH                  <54>
[ 1771.796959]   TDT                  <54>
[ 1771.796960]   next_to_use          <54>
[ 1771.796961]   next_to_clean        <a9>
[ 1771.796962] buffer_info[next_to_clean]
[ 1771.796963]   time_stamp           <14d72e>
[ 1771.796964]   next_to_watch        <a9>
[ 1771.796965]   jiffies              <14ddd3>
[ 1771.796966]   next_to_watch.status <1>

Thanks.


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply

* Re: Not understand some in htb_do_events function
From: Patrick McHardy @ 2008-01-15 15:54 UTC (permalink / raw)
  To: Martin Devera; +Cc: Badalian Vyacheslav, netdev
In-Reply-To: <478CA02A.1070308@cdi.cz>

Martin Devera wrote:
> Patrick McHardy wrote:
>> Badalian Vyacheslav wrote:
>>> Hello all.
>>> I have many messages like "htb: too many events !" in dmesg.
>>>
>>> Try to see code and find that function try do 500 events at call.
>>> Hm... may anyone ask why 500? Why its not dynamic value based on 
>>> performance of PC?
>>
>>
>> Thats a good question, I wonder why it is limited at all.
>> Martin, any hints?
>>
> Hi, I recently replied someone to the same question:
> 
>  > it is possible when during one jiffie (1 or 10ms) more than 500 classes
>  > changed its state. It is meant to protect your system from livelock.
>  > The constant should be set to something like
>  > bogomips/bogocomplexity_of_state_change but it was not done.
> 
> the solution I have in my mind is to change
>    if (net_ratelimit())
>                  printk(KERN_WARNING "htb: too many events !\n");
>          return HZ/10;
>  to
>    return 1;
> 
> to drain extra events asap. It the time of writing I was not able to
> come with better solution and there were more bugs related to this
> part of code than now.

So this was meant to protect against endless loops?

> We want way to smooth big burst of events over more dequeue invocations
> in order to not slow dequeue too much. Constant 500 is max. allowed
> "slowdown" of dequeue.
> Any bright idea how to do it more elegant, Patrick ?


Unfortunately not, but I believe simply removing the limit
completely would be better than picking an arbitary value.

^ permalink raw reply

* Re: [NETFILTER]: Supress some sparse warnings
From: Patrick McHardy @ 2008-01-15 15:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org, Netfilter Development Mailinglist
In-Reply-To: <20080115124046.1c10dfeb.dada1@cosmosbay.com>

Eric Dumazet wrote:
> Hi Patrick
> 
> Please find some sparse cleanups, against net-2.6.25
> 
> Thank you
> 
> [NETFILTER]: Supress some sparse warnings
> 
>   CHECK   net/netfilter/nf_conntrack_expect.c
> net/netfilter/nf_conntrack_expect.c:429:13: warning: context imbalance in 'exp_seq_start' - wrong count at exit
> net/netfilter/nf_conntrack_expect.c:441:13: warning: context imbalance in 'exp_seq_stop' - unexpected unlock
>   CHECK   net/netfilter/nf_log.c
> net/netfilter/nf_log.c:105:13: warning: context imbalance in 'seq_start' - wrong count at exit
> net/netfilter/nf_log.c:125:13: warning: context imbalance in 'seq_stop' - unexpected unlock
>   CHECK   net/netfilter/nfnetlink_queue.c
> net/netfilter/nfnetlink_queue.c:363:7: warning: symbol 'size' shadows an earlier one
> net/netfilter/nfnetlink_queue.c:217:9: originally declared here
> net/netfilter/nfnetlink_queue.c:847:13: warning: context imbalance in 'seq_start' - wrong count at exit
> net/netfilter/nfnetlink_queue.c:859:13: warning: context imbalance in 'seq_stop' - unexpected unlock

Applied, thanks Eric.


^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Radoslaw Szkodzinski @ 2008-01-15 15:17 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Jarek Poplawski, David Miller, netdev, linux-kernel
In-Reply-To: <478CC76B.1020804@nortel.com>

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

On Tue, 15 Jan 2008 08:47:07 -0600
"Chris Friesen" <cfriesen@nortel.com> wrote:

> Jarek Poplawski wrote:
> 
> > IMHO, checking this with a current stable, which probably you are going
> > to do some day, anyway, should be 100% acceptable: giving some input to
> > netdev, while still working for yourself.
> 
> While I would love to do this, it's not that simple.
> 
> Some of our hardware is not supported on mainline, so we need per-kernel 
> version patches to even bring up the blade.  The blades netboot via a 
> jumbo-frame network, so kernel extensions are needed to handle setting 
> the MTU before mounting the rootfs.

Why? Can't you use a small initramfs to set it up?

> The blade in question uses CKRM 
> which doesn't exist for newer kernels so the problem may simply be 
> hidden by scheduling differences.

Current spiritual successor is Ingo's realtime patchset I guess.

> The userspace application uses other 
> kernel features that are not in mainline (and likely some of them won't 
> ever be in mainline--I know because I've tried).

What would these be? Some proc or sysfs files that were removed or
renamed?
Maybe they can be worked around w/o changing the application at all, or
very minor changes.

Also, be sure to check if there are pauses with other CPU hogs.

> Basically, the number of changes required for our environment makes it 
> difficult to just boot up the latest kernel.

Yes, adding an initramfs is non-trivial (some busybox + shell
scripting), but not all that hard either.
And maintaining one is easier than patching the kernel.
(usually 0 maintenance, maybe minor fixes sometimes)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Vlad Yasevich @ 2008-01-15 15:09 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Eric Dumazet, Ray Lee, netdev, linux-kernel
In-Reply-To: <478BBFE3.9030707@nortel.com>

Chris Friesen wrote:
> Eric Dumazet wrote:
>> Chris Friesen a écrit :
> 
>>> Based on the profiling information we're spending time in 
>>> sctp_endpoint_lookup_assoc() which doesn't actually use hashes, so I 
>>> can't see how the hash would be related.  I'm pretty new to SCTP 
>>> though, so I may be missing something.
>>
>> Well, it does use hashes :)
>>
>>        hash = sctp_assoc_hashfn(ep->base.bind_addr.port, rport);
>>        head = &sctp_assoc_hashtable[hash];
>>        read_lock(&head->lock);
>>        sctp_for_each_hentry(epb, node, &head->chain) {
>>            /* maybe your machine is traversing here a *really* long 
>> chain */
>>            }
> 
> 
> The latest released kernel doesn't have this code, it was only added in 
> November.  The SCTP maintainer just pointed me to the patch, and made 
> some other suggestions as well.
> 

Yes,  the hash code only got added to 2.6.24.  Before that, it was
a linear list traversal which sucked.

I need to take a look at the locking to see if we can further reduce
the time that bottom half is disabled.

-vlad

^ permalink raw reply

* [PATCH 1/4] [ROSE] simplified rose_get_route()
From: Bernard Pidoux @ 2008-01-15 15:00 UTC (permalink / raw)
  To: Ralf Baechle DL5RB, Alexey Dobriyan, David Miller,
	Linux Netdev List, Eric Dumazet
In-Reply-To: <47630274.1080706@ccr.jussieu.fr>

Hi,

This patch is send to replace preceding commit :

 From fd66cc115e058b2fc63a0e26aa73f1d27113105a Mon Sep 17 00:00:00 2001
From: Bernard Pidoux <f6bvp@amsat.org>
Date: Thu, 10 Jan 2008 23:10:44 +0100
Subject: [PATCH 1/4] [ROSE] new rose_get_route() function


I removed unnecessary lines of code copied from rose_get_neigh().
The function is now called with fewer arguments and if
a NULL is returned then the next call to rose_transmit_clear_request()
will manage the error code. I verified that the frame routing was
actually handled as required on a FPAC/ROSE node with heavy BBS traffic 
through 8 adjacent nodes connected through radio ports and Internet 
AXUDP / AXIP connections.


 From 6708b3853a5ecf5c185942b3757223e451af1960 Mon Sep 17 00:00:00 2001
From: Bernard Pidoux <f6bvp@amsat.org>
Date: Sun, 13 Jan 2008 20:25:40 +0100
Subject: [PATCH 1/4] [ROSE] simplified rose_get_route()

rose_get_neigh() was called by two different functions.
Firstly, by rose_connect() in order to establish connections to
adjacent rose nodes. This worked correctly.
Secondly, it was called by rose_route_frame() to find a route
via an adjacent node. This was not working efficiently, for the
whole node neighbour list was not checked and the proper test
was not performed in order to check if a node was already connected.
We create new function rose_get_route() to achieve this.
It returns a ROSE node address for sending a frame via a connected node
to the specified destination address.If a NULL is returned, the
previous program behavior is performed. ROSE tries to initiate a connect 
to the appropriate adjacent node.

Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
---
  net/rose/rose_route.c |   24 +++++++++++++++++++++---
  1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c
index 540c0f2..5d6fac4 100644
--- a/net/rose/rose_route.c
+++ b/net/rose/rose_route.c
@@ -662,6 +662,25 @@ struct rose_route *rose_route_free_lci(unsigned int 
lci, struct rose_neigh *neig
  }

  /*
+ *     Find an opened route given a ROSE address.
+ */
+static struct rose_neigh *rose_get_route(rose_address *addr)
+{
+       struct rose_node *node;
+       int i;
+
+       for (node = rose_node_list; node != NULL; node = node->next) {
+               if (rosecmpm(addr, &node->address, node->mask) == 0) {
+                       for (i = 0; i < node->count; i++) {
+                               if (node->neighbour[i]->restarted)
+                                       return node->neighbour[i];
+                       }
+               }
+       }
+       return NULL;
+}
+
+/*
   *     Find a neighbour given a ROSE address.
   */
  struct rose_neigh *rose_get_neigh(rose_address *addr, unsigned char 
*cause,
@@ -842,7 +861,6 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
         struct sock *sk;
         unsigned short frametype;
         unsigned int lci, new_lci;
-       unsigned char cause, diagnostic;
         struct net_device *dev;
         int len, res = 0;
         char buf[11];
@@ -1019,8 +1037,8 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb 
*ax25)
                 rose_route = rose_route->next;
         }

-       if ((new_neigh = rose_get_neigh(dest_addr, &cause, &diagnostic)) 
== NULL) {
-               rose_transmit_clear_request(rose_neigh, lci, cause, 
diagnostic);
+       if ((new_neigh = rose_get_route(dest_addr)) == NULL) {
+               rose_transmit_clear_request(rose_neigh, lci, 
ROSE_NOT_OBTAINABLE, 0);
                 goto out;
         }

--
1.5.3.7


^ permalink raw reply related

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-15 14:47 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20080115071950.GA1696@ff.dom.local>

Jarek Poplawski wrote:

> IMHO, checking this with a current stable, which probably you are going
> to do some day, anyway, should be 100% acceptable: giving some input to
> netdev, while still working for yourself.

While I would love to do this, it's not that simple.

Some of our hardware is not supported on mainline, so we need per-kernel 
version patches to even bring up the blade.  The blades netboot via a 
jumbo-frame network, so kernel extensions are needed to handle setting 
the MTU before mounting the rootfs.  The blade in question uses CKRM 
which doesn't exist for newer kernels so the problem may simply be 
hidden by scheduling differences.  The userspace application uses other 
kernel features that are not in mainline (and likely some of them won't 
ever be in mainline--I know because I've tried).

Basically, the number of changes required for our environment makes it 
difficult to just boot up the latest kernel.

Chris

^ permalink raw reply

* Re: [PATCH 2/4] [ROSE] rose_get_route() template
From: Bernard Pidoux @ 2008-01-15 14:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ralf Baechle DL5RB, Alexey Dobriyan, David Miller,
	Linux Netdev List
In-Reply-To: <47893127.2080606@ccr.jussieu.fr>

Hi,

I wrote a "simplified" get_route() function.
It is declared as static following judicious Eric's remark.

Then, the following patch of include/net/rose.h is no more
necessary.

A new commit for rose_get_route() will be presented in next
message.

Thank you Eric for pushing me to reexamine my code.

Regards,

Bernard P.



Bernard Pidoux wrote:
> 
> 
> Eric Dumazet wrote:
>> Bernard Pidoux a écrit :
>>>  From 46bccce1e660a39bcc8f8cf87d4c17de33f4ba48 Mon Sep 17 00:00:00 2001
>>> From: Bernard Pidoux <f6bvp@amsat.org>
>>> Date: Thu, 10 Jan 2008 23:01:46 +0100
>>> Subject: [PATCH 2/4] [ROSE] template declaration for rose_get_route()
>>>
>>> Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
>>> ---
>>>  include/net/rose.h |    1 +
>>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/net/rose.h b/include/net/rose.h
>>> index e5bb084..d3ab453 100644
>>> --- a/include/net/rose.h
>>> +++ b/include/net/rose.h
>>> @@ -202,6 +202,7 @@ extern struct net_device *rose_dev_first(void);
>>>  extern struct net_device *rose_dev_get(rose_address *);
>>>  extern struct rose_route *rose_route_free_lci(unsigned int, struct 
>>> rose_neigh *);
>>>  extern struct rose_neigh *rose_get_neigh(rose_address *, unsigned 
>>> char *, unsigned char *);
>>> +extern struct rose_neigh *rose_get_route(rose_address *, unsigned 
>>> char *, unsigned char *);
>>>  extern int  rose_rt_ioctl(unsigned int, void __user *);
>>>  extern void rose_link_failed(ax25_cb *, int);
>>>  extern int  rose_route_frame(struct sk_buff *, ax25_cb *);
>>> -- 
>>
>> Strange... if rose_get_route() is used only in net/rose/rose_route.c, 
>> why dont you define it static, and not extern in include/net/rose.h ?
>>
>>
>>
> 
> I agree. You are perfectly right.
> There is no need to declare rose_get_route() external.
> I stupidly copied rose_get_neigh()definition from which I derived 
> rose_get_route();
> 
> Also I am not sure that setting cause and diagnostic is necessary, as 
> they are not used by the calling function.
> 
> By the way. I made a typo in [PATCH 4/4].
> 
> Instead of "Initial connection to rose neighbour nodes was unusually,
> t0 timer was blocked and application program could not
> use socket."
> 
> One should read
> "Initial connection to rose neighbour nodes was unusually,
> long, t0 timer was blocked and application program could not
> use socket."
> 
> Bernard P.
> 
> 
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox