* neighbour table RCU @ 2009-08-31 22:04 Stephen Hemminger 2009-09-01 6:50 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Stephen Hemminger @ 2009-08-31 22:04 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Looking at the neighbour table, it should be possible to get rid of the two reader/writer locks. The hash table lock is pretty amenable to RCU, but the dynamic resizing makes it non-trivial. Thinking of using a combination of RCU and sequence counts so that the reader would just rescan if resize was in progress. The reader/writer lock on the neighbour entry is more of a problem. Probably would be simpler/faster to change it into a spinlock and be done with it. The reader/writer lock is also used for the proxy list hash table, but that can just be a simple spinlock. -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-08-31 22:04 neighbour table RCU Stephen Hemminger @ 2009-09-01 6:50 ` Eric Dumazet 2009-09-01 15:55 ` Octavian Purdila 2009-09-01 15:59 ` Stephen Hemminger 0 siblings, 2 replies; 9+ messages in thread From: Eric Dumazet @ 2009-09-01 6:50 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev Stephen Hemminger a écrit : > Looking at the neighbour table, it should be possible to get > rid of the two reader/writer locks. The hash table lock is pretty > amenable to RCU, but the dynamic resizing makes it non-trivial. > Thinking of using a combination of RCU and sequence counts so that the > reader would just rescan if resize was in progress. I am not sure neigh_tbl_lock rwlock should be changed, I did not see any contention on it. > > The reader/writer lock on the neighbour entry is more of a problem. > Probably would be simpler/faster to change it into a spinlock and > be done with it. > > The reader/writer lock is also used for the proxy list hash table, > but that can just be a simple spinlock. > This is probably is the only thing we want to do at this moment, halving atomic ops on neigh_resolve_output() But why neigh_resolve_output() was called so much in the bench is the question... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-09-01 6:50 ` Eric Dumazet @ 2009-09-01 15:55 ` Octavian Purdila 2009-09-01 16:14 ` Eric Dumazet 2009-09-01 16:23 ` Stephen Hemminger 2009-09-01 15:59 ` Stephen Hemminger 1 sibling, 2 replies; 9+ messages in thread From: Octavian Purdila @ 2009-09-01 15:55 UTC (permalink / raw) To: Eric Dumazet; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote: > Stephen Hemminger a écrit : > > Looking at the neighbour table, it should be possible to get > > rid of the two reader/writer locks. The hash table lock is pretty > > amenable to RCU, but the dynamic resizing makes it non-trivial. > > Thinking of using a combination of RCU and sequence counts so that the > > reader would just rescan if resize was in progress. > > I am not sure neigh_tbl_lock rwlock should be changed, I did not > see any contention on it. > Speaking about neighbour optimizations, here is a RFC patch which makes the tables double linked, for constant time deletion. It has given us a significant performance improvement - in less then usual setups though, with lots of neighbours. Would something like this be acceptable for upstream? (pardon the p4 diff dump :) - but I think it will give a rough idea, if acceptable will clean it up and properly submit it) BTW, would switching to list_head be better? Thanks, tavi ==== //packages/linux-2.6.7/main/src/include/net/neighbour.h#2 (text) ==== @@ -56,6 +56,7 @@ struct neigh_parms { struct neigh_parms *next; + struct neigh_parms **pprev; int (*neigh_setup)(struct neighbour *); struct neigh_table *tbl; int entries; ==== //packages/linux-2.6.7/main/src/net/core/neighbour.c#3 (text) ==== @@ -1127,8 +1127,10 @@ } p->sysctl_table = NULL; write_lock_bh(&tbl->lock); - p->next = tbl->parms.next; + if ((p->next = tbl->parms.next)) + p->next->pprev = &p->next; tbl->parms.next = p; + p->pprev = &tbl->parms.next; write_unlock_bh(&tbl->lock); } return p; @@ -1136,21 +1138,14 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) { - struct neigh_parms **p; - if (!parms || parms == &tbl->parms) return; write_lock_bh(&tbl->lock); - for (p = &tbl->parms.next; *p; p = &(*p)->next) { - if (*p == parms) { - *p = parms->next; - write_unlock_bh(&tbl->lock); - kfree(parms); - return; - } - } + if ((*parms->pprev = parms->next)) + parms->next->pprev = parms->pprev; write_unlock_bh(&tbl->lock); - NEIGH_PRINTK1("neigh_parms_release: not found\n"); + kfree(parms); + return; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-09-01 15:55 ` Octavian Purdila @ 2009-09-01 16:14 ` Eric Dumazet 2009-09-01 16:56 ` Octavian Purdila 2009-09-01 16:23 ` Stephen Hemminger 1 sibling, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2009-09-01 16:14 UTC (permalink / raw) To: Octavian Purdila; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev Octavian Purdila a écrit : > On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote: >> Stephen Hemminger a écrit : >>> Looking at the neighbour table, it should be possible to get >>> rid of the two reader/writer locks. The hash table lock is pretty >>> amenable to RCU, but the dynamic resizing makes it non-trivial. >>> Thinking of using a combination of RCU and sequence counts so that the >>> reader would just rescan if resize was in progress. >> I am not sure neigh_tbl_lock rwlock should be changed, I did not >> see any contention on it. >> > > Speaking about neighbour optimizations, here is a RFC patch which makes the > tables double linked, for constant time deletion. It has given us a significant > performance improvement - in less then usual setups though, with lots of > neighbours. How many "struct neigh_parms" do you have in your setups, and what is the frequency of neigh_parms_release() calls ??? > > Would something like this be acceptable for upstream? (pardon the p4 diff dump > :) - but I think it will give a rough idea, if acceptable will clean it up and > properly submit it) Seems straightforward > > BTW, would switching to list_head be better? Yes, definitly :) > > Thanks, > tavi > > ==== //packages/linux-2.6.7/main/src/include/net/neighbour.h#2 (text) ==== > > @@ -56,6 +56,7 @@ > struct neigh_parms > { > struct neigh_parms *next; > + struct neigh_parms **pprev; > int (*neigh_setup)(struct neighbour *); > struct neigh_table *tbl; > int entries; > > ==== //packages/linux-2.6.7/main/src/net/core/neighbour.c#3 (text) ==== > > @@ -1127,8 +1127,10 @@ > } > p->sysctl_table = NULL; > write_lock_bh(&tbl->lock); > - p->next = tbl->parms.next; > + if ((p->next = tbl->parms.next)) > + p->next->pprev = &p->next; > tbl->parms.next = p; > + p->pprev = &tbl->parms.next; > write_unlock_bh(&tbl->lock); > } > return p; > @@ -1136,21 +1138,14 @@ > > void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) > { > - struct neigh_parms **p; > - > if (!parms || parms == &tbl->parms) > return; > write_lock_bh(&tbl->lock); > - for (p = &tbl->parms.next; *p; p = &(*p)->next) { > - if (*p == parms) { > - *p = parms->next; > - write_unlock_bh(&tbl->lock); > - kfree(parms); > - return; > - } > - } > + if ((*parms->pprev = parms->next)) > + parms->next->pprev = parms->pprev; > write_unlock_bh(&tbl->lock); > - NEIGH_PRINTK1("neigh_parms_release: not found\n"); > + kfree(parms); > + return; > } > -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-09-01 16:14 ` Eric Dumazet @ 2009-09-01 16:56 ` Octavian Purdila 0 siblings, 0 replies; 9+ messages in thread From: Octavian Purdila @ 2009-09-01 16:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev On Tuesday 01 September 2009 19:14:37 Eric Dumazet wrote: > Octavian Purdila a écrit : > > On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote: > >> Stephen Hemminger a écrit : > >>> Looking at the neighbour table, it should be possible to get > >>> rid of the two reader/writer locks. The hash table lock is pretty > >>> amenable to RCU, but the dynamic resizing makes it non-trivial. > >>> Thinking of using a combination of RCU and sequence counts so that the > >>> reader would just rescan if resize was in progress. > >> > >> I am not sure neigh_tbl_lock rwlock should be changed, I did not > >> see any contention on it. > > > > Speaking about neighbour optimizations, here is a RFC patch which makes > > the tables double linked, for constant time deletion. It has given us a > > significant performance improvement - in less then usual setups though, > > with lots of neighbours. > > How many "struct neigh_parms" do you have in your setups, and > what is the frequency of neigh_parms_release() calls ??? > Each L3 interface part (at least for IPv4/IPv6 and I see DecNet as well) has a neigh_parms associated. And we use up to 128K interfaces in certain tests scenarios ;) In our case the interfaces are created/destroyed dynamically, so when using a large number of interfaces the test cleanup takes forever without this (and other) patch(es). Thanks, tavi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-09-01 15:55 ` Octavian Purdila 2009-09-01 16:14 ` Eric Dumazet @ 2009-09-01 16:23 ` Stephen Hemminger 1 sibling, 0 replies; 9+ messages in thread From: Stephen Hemminger @ 2009-09-01 16:23 UTC (permalink / raw) To: Octavian Purdila; +Cc: Eric Dumazet, Lucian Adrian Grijincu, netdev On Tue, 1 Sep 2009 18:55:34 +0300 Octavian Purdila <opurdila@ixiacom.com> wrote: > On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote: > > Stephen Hemminger a écrit : > > > Looking at the neighbour table, it should be possible to get > > > rid of the two reader/writer locks. The hash table lock is pretty > > > amenable to RCU, but the dynamic resizing makes it non-trivial. > > > Thinking of using a combination of RCU and sequence counts so that the > > > reader would just rescan if resize was in progress. > > > > I am not sure neigh_tbl_lock rwlock should be changed, I did not > > see any contention on it. > > > > Speaking about neighbour optimizations, here is a RFC patch which makes the > tables double linked, for constant time deletion. It has given us a significant > performance improvement - in less then usual setups though, with lots of > neighbours. > > Would something like this be acceptable for upstream? (pardon the p4 diff dump > :) - but I think it will give a rough idea, if acceptable will clean it up and > properly submit it) > > BTW, would switching to list_head be better? Use hlist for the neighbour table. It has the right properties and makes future RCU easier. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-09-01 6:50 ` Eric Dumazet 2009-09-01 15:55 ` Octavian Purdila @ 2009-09-01 15:59 ` Stephen Hemminger 2009-09-01 16:13 ` Eric Dumazet 1 sibling, 1 reply; 9+ messages in thread From: Stephen Hemminger @ 2009-09-01 15:59 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev On Tue, 01 Sep 2009 08:50:17 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Stephen Hemminger a écrit : > > Looking at the neighbour table, it should be possible to get > > rid of the two reader/writer locks. The hash table lock is pretty > > amenable to RCU, but the dynamic resizing makes it non-trivial. > > Thinking of using a combination of RCU and sequence counts so that the > > reader would just rescan if resize was in progress. > > I am not sure neigh_tbl_lock rwlock should be changed, I did not > see any contention on it. > > > > > The reader/writer lock on the neighbour entry is more of a problem. > > Probably would be simpler/faster to change it into a spinlock and > > be done with it. > > > > The reader/writer lock is also used for the proxy list hash table, > > but that can just be a simple spinlock. > > > > This is probably is the only thing we want to do at this moment, > halving atomic ops on neigh_resolve_output() > > But why neigh_resolve_output() was called so much in the bench > is the question... > Every packet has to have an ARP resolution. -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-09-01 15:59 ` Stephen Hemminger @ 2009-09-01 16:13 ` Eric Dumazet 2009-09-01 21:24 ` Stephen Hemminger 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2009-09-01 16:13 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev Stephen Hemminger a écrit : > On Tue, 01 Sep 2009 08:50:17 +0200 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> Stephen Hemminger a écrit : >>> Looking at the neighbour table, it should be possible to get >>> rid of the two reader/writer locks. The hash table lock is pretty >>> amenable to RCU, but the dynamic resizing makes it non-trivial. >>> Thinking of using a combination of RCU and sequence counts so that the >>> reader would just rescan if resize was in progress. >> I am not sure neigh_tbl_lock rwlock should be changed, I did not >> see any contention on it. >> >>> The reader/writer lock on the neighbour entry is more of a problem. >>> Probably would be simpler/faster to change it into a spinlock and >>> be done with it. >>> >>> The reader/writer lock is also used for the proxy list hash table, >>> but that can just be a simple spinlock. >>> >> This is probably is the only thing we want to do at this moment, >> halving atomic ops on neigh_resolve_output() >> >> But why neigh_resolve_output() was called so much in the bench >> is the question... >> > > Every packet has to have an ARP resolution. > Sure, but I thought we had a cache ? static inline int ip_finish_output2(struct sk_buff *skb) { ... if (dst->hh) return neigh_hh_output(dst->hh, skb); else if (dst->neighbour) return dst->neighbour->output(skb); << should fill cache first time >> ... } in my pktgen benches, I always hit same dst so should take the hh cache ? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU 2009-09-01 16:13 ` Eric Dumazet @ 2009-09-01 21:24 ` Stephen Hemminger 0 siblings, 0 replies; 9+ messages in thread From: Stephen Hemminger @ 2009-09-01 21:24 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev On Tue, 01 Sep 2009 18:13:40 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Stephen Hemminger a écrit : > > On Tue, 01 Sep 2009 08:50:17 +0200 > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > >> Stephen Hemminger a écrit : > >>> Looking at the neighbour table, it should be possible to get > >>> rid of the two reader/writer locks. The hash table lock is pretty > >>> amenable to RCU, but the dynamic resizing makes it non-trivial. > >>> Thinking of using a combination of RCU and sequence counts so that the > >>> reader would just rescan if resize was in progress. > >> I am not sure neigh_tbl_lock rwlock should be changed, I did not > >> see any contention on it. > >> > >>> The reader/writer lock on the neighbour entry is more of a problem. > >>> Probably would be simpler/faster to change it into a spinlock and > >>> be done with it. > >>> > >>> The reader/writer lock is also used for the proxy list hash table, > >>> but that can just be a simple spinlock. > >>> > >> This is probably is the only thing we want to do at this moment, > >> halving atomic ops on neigh_resolve_output() > >> > >> But why neigh_resolve_output() was called so much in the bench > >> is the question... > >> > > > > Every packet has to have an ARP resolution. > > > > Sure, but I thought we had a cache ? > > static inline int ip_finish_output2(struct sk_buff *skb) > { > ... > if (dst->hh) > return neigh_hh_output(dst->hh, skb); > else if (dst->neighbour) > return dst->neighbour->output(skb); << should fill cache first time >> > ... > } > > in my pktgen benches, I always hit same dst so should take the hh cache ? I ping the remote host before starting pktgen, that way I figure the ARP and cache is available. -- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-09-01 21:24 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-31 22:04 neighbour table RCU Stephen Hemminger 2009-09-01 6:50 ` Eric Dumazet 2009-09-01 15:55 ` Octavian Purdila 2009-09-01 16:14 ` Eric Dumazet 2009-09-01 16:56 ` Octavian Purdila 2009-09-01 16:23 ` Stephen Hemminger 2009-09-01 15:59 ` Stephen Hemminger 2009-09-01 16:13 ` Eric Dumazet 2009-09-01 21:24 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).