* neighbour table RCU
@ 2009-08-31 22:04 Stephen Hemminger
2009-09-01 6:50 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2009-08-31 22:04 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
Looking at the neighbour table, it should be possible to get
rid of the two reader/writer locks. The hash table lock is pretty
amenable to RCU, but the dynamic resizing makes it non-trivial.
Thinking of using a combination of RCU and sequence counts so that the
reader would just rescan if resize was in progress.
The reader/writer lock on the neighbour entry is more of a problem.
Probably would be simpler/faster to change it into a spinlock and
be done with it.
The reader/writer lock is also used for the proxy list hash table,
but that can just be a simple spinlock.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-08-31 22:04 neighbour table RCU Stephen Hemminger
@ 2009-09-01 6:50 ` Eric Dumazet
2009-09-01 15:55 ` Octavian Purdila
2009-09-01 15:59 ` Stephen Hemminger
0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2009-09-01 6:50 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
Stephen Hemminger a écrit :
> Looking at the neighbour table, it should be possible to get
> rid of the two reader/writer locks. The hash table lock is pretty
> amenable to RCU, but the dynamic resizing makes it non-trivial.
> Thinking of using a combination of RCU and sequence counts so that the
> reader would just rescan if resize was in progress.
I am not sure neigh_tbl_lock rwlock should be changed, I did not
see any contention on it.
>
> The reader/writer lock on the neighbour entry is more of a problem.
> Probably would be simpler/faster to change it into a spinlock and
> be done with it.
>
> The reader/writer lock is also used for the proxy list hash table,
> but that can just be a simple spinlock.
>
This is probably is the only thing we want to do at this moment,
halving atomic ops on neigh_resolve_output()
But why neigh_resolve_output() was called so much in the bench
is the question...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-09-01 6:50 ` Eric Dumazet
@ 2009-09-01 15:55 ` Octavian Purdila
2009-09-01 16:14 ` Eric Dumazet
2009-09-01 16:23 ` Stephen Hemminger
2009-09-01 15:59 ` Stephen Hemminger
1 sibling, 2 replies; 9+ messages in thread
From: Octavian Purdila @ 2009-09-01 15:55 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev
On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
> Stephen Hemminger a écrit :
> > Looking at the neighbour table, it should be possible to get
> > rid of the two reader/writer locks. The hash table lock is pretty
> > amenable to RCU, but the dynamic resizing makes it non-trivial.
> > Thinking of using a combination of RCU and sequence counts so that the
> > reader would just rescan if resize was in progress.
>
> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> see any contention on it.
>
Speaking about neighbour optimizations, here is a RFC patch which makes the
tables double linked, for constant time deletion. It has given us a significant
performance improvement - in less then usual setups though, with lots of
neighbours.
Would something like this be acceptable for upstream? (pardon the p4 diff dump
:) - but I think it will give a rough idea, if acceptable will clean it up and
properly submit it)
BTW, would switching to list_head be better?
Thanks,
tavi
==== //packages/linux-2.6.7/main/src/include/net/neighbour.h#2 (text) ====
@@ -56,6 +56,7 @@
struct neigh_parms
{
struct neigh_parms *next;
+ struct neigh_parms **pprev;
int (*neigh_setup)(struct neighbour *);
struct neigh_table *tbl;
int entries;
==== //packages/linux-2.6.7/main/src/net/core/neighbour.c#3 (text) ====
@@ -1127,8 +1127,10 @@
}
p->sysctl_table = NULL;
write_lock_bh(&tbl->lock);
- p->next = tbl->parms.next;
+ if ((p->next = tbl->parms.next))
+ p->next->pprev = &p->next;
tbl->parms.next = p;
+ p->pprev = &tbl->parms.next;
write_unlock_bh(&tbl->lock);
}
return p;
@@ -1136,21 +1138,14 @@
void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
{
- struct neigh_parms **p;
-
if (!parms || parms == &tbl->parms)
return;
write_lock_bh(&tbl->lock);
- for (p = &tbl->parms.next; *p; p = &(*p)->next) {
- if (*p == parms) {
- *p = parms->next;
- write_unlock_bh(&tbl->lock);
- kfree(parms);
- return;
- }
- }
+ if ((*parms->pprev = parms->next))
+ parms->next->pprev = parms->pprev;
write_unlock_bh(&tbl->lock);
- NEIGH_PRINTK1("neigh_parms_release: not found\n");
+ kfree(parms);
+ return;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-09-01 6:50 ` Eric Dumazet
2009-09-01 15:55 ` Octavian Purdila
@ 2009-09-01 15:59 ` Stephen Hemminger
2009-09-01 16:13 ` Eric Dumazet
1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2009-09-01 15:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On Tue, 01 Sep 2009 08:50:17 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Stephen Hemminger a écrit :
> > Looking at the neighbour table, it should be possible to get
> > rid of the two reader/writer locks. The hash table lock is pretty
> > amenable to RCU, but the dynamic resizing makes it non-trivial.
> > Thinking of using a combination of RCU and sequence counts so that the
> > reader would just rescan if resize was in progress.
>
> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> see any contention on it.
>
> >
> > The reader/writer lock on the neighbour entry is more of a problem.
> > Probably would be simpler/faster to change it into a spinlock and
> > be done with it.
> >
> > The reader/writer lock is also used for the proxy list hash table,
> > but that can just be a simple spinlock.
> >
>
> This is probably is the only thing we want to do at this moment,
> halving atomic ops on neigh_resolve_output()
>
> But why neigh_resolve_output() was called so much in the bench
> is the question...
>
Every packet has to have an ARP resolution.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-09-01 15:59 ` Stephen Hemminger
@ 2009-09-01 16:13 ` Eric Dumazet
2009-09-01 21:24 ` Stephen Hemminger
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-09-01 16:13 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
Stephen Hemminger a écrit :
> On Tue, 01 Sep 2009 08:50:17 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> Stephen Hemminger a écrit :
>>> Looking at the neighbour table, it should be possible to get
>>> rid of the two reader/writer locks. The hash table lock is pretty
>>> amenable to RCU, but the dynamic resizing makes it non-trivial.
>>> Thinking of using a combination of RCU and sequence counts so that the
>>> reader would just rescan if resize was in progress.
>> I am not sure neigh_tbl_lock rwlock should be changed, I did not
>> see any contention on it.
>>
>>> The reader/writer lock on the neighbour entry is more of a problem.
>>> Probably would be simpler/faster to change it into a spinlock and
>>> be done with it.
>>>
>>> The reader/writer lock is also used for the proxy list hash table,
>>> but that can just be a simple spinlock.
>>>
>> This is probably is the only thing we want to do at this moment,
>> halving atomic ops on neigh_resolve_output()
>>
>> But why neigh_resolve_output() was called so much in the bench
>> is the question...
>>
>
> Every packet has to have an ARP resolution.
>
Sure, but I thought we had a cache ?
static inline int ip_finish_output2(struct sk_buff *skb)
{
...
if (dst->hh)
return neigh_hh_output(dst->hh, skb);
else if (dst->neighbour)
return dst->neighbour->output(skb); << should fill cache first time >>
...
}
in my pktgen benches, I always hit same dst so should take the hh cache ?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-09-01 15:55 ` Octavian Purdila
@ 2009-09-01 16:14 ` Eric Dumazet
2009-09-01 16:56 ` Octavian Purdila
2009-09-01 16:23 ` Stephen Hemminger
1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-09-01 16:14 UTC (permalink / raw)
To: Octavian Purdila; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev
Octavian Purdila a écrit :
> On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
>> Stephen Hemminger a écrit :
>>> Looking at the neighbour table, it should be possible to get
>>> rid of the two reader/writer locks. The hash table lock is pretty
>>> amenable to RCU, but the dynamic resizing makes it non-trivial.
>>> Thinking of using a combination of RCU and sequence counts so that the
>>> reader would just rescan if resize was in progress.
>> I am not sure neigh_tbl_lock rwlock should be changed, I did not
>> see any contention on it.
>>
>
> Speaking about neighbour optimizations, here is a RFC patch which makes the
> tables double linked, for constant time deletion. It has given us a significant
> performance improvement - in less then usual setups though, with lots of
> neighbours.
How many "struct neigh_parms" do you have in your setups, and
what is the frequency of neigh_parms_release() calls ???
>
> Would something like this be acceptable for upstream? (pardon the p4 diff dump
> :) - but I think it will give a rough idea, if acceptable will clean it up and
> properly submit it)
Seems straightforward
>
> BTW, would switching to list_head be better?
Yes, definitly :)
>
> Thanks,
> tavi
>
> ==== //packages/linux-2.6.7/main/src/include/net/neighbour.h#2 (text) ====
>
> @@ -56,6 +56,7 @@
> struct neigh_parms
> {
> struct neigh_parms *next;
> + struct neigh_parms **pprev;
> int (*neigh_setup)(struct neighbour *);
> struct neigh_table *tbl;
> int entries;
>
> ==== //packages/linux-2.6.7/main/src/net/core/neighbour.c#3 (text) ====
>
> @@ -1127,8 +1127,10 @@
> }
> p->sysctl_table = NULL;
> write_lock_bh(&tbl->lock);
> - p->next = tbl->parms.next;
> + if ((p->next = tbl->parms.next))
> + p->next->pprev = &p->next;
> tbl->parms.next = p;
> + p->pprev = &tbl->parms.next;
> write_unlock_bh(&tbl->lock);
> }
> return p;
> @@ -1136,21 +1138,14 @@
>
> void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
> {
> - struct neigh_parms **p;
> -
> if (!parms || parms == &tbl->parms)
> return;
> write_lock_bh(&tbl->lock);
> - for (p = &tbl->parms.next; *p; p = &(*p)->next) {
> - if (*p == parms) {
> - *p = parms->next;
> - write_unlock_bh(&tbl->lock);
> - kfree(parms);
> - return;
> - }
> - }
> + if ((*parms->pprev = parms->next))
> + parms->next->pprev = parms->pprev;
> write_unlock_bh(&tbl->lock);
> - NEIGH_PRINTK1("neigh_parms_release: not found\n");
> + kfree(parms);
> + return;
> }
> --
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-09-01 15:55 ` Octavian Purdila
2009-09-01 16:14 ` Eric Dumazet
@ 2009-09-01 16:23 ` Stephen Hemminger
1 sibling, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2009-09-01 16:23 UTC (permalink / raw)
To: Octavian Purdila; +Cc: Eric Dumazet, Lucian Adrian Grijincu, netdev
On Tue, 1 Sep 2009 18:55:34 +0300
Octavian Purdila <opurdila@ixiacom.com> wrote:
> On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
> > Stephen Hemminger a écrit :
> > > Looking at the neighbour table, it should be possible to get
> > > rid of the two reader/writer locks. The hash table lock is pretty
> > > amenable to RCU, but the dynamic resizing makes it non-trivial.
> > > Thinking of using a combination of RCU and sequence counts so that the
> > > reader would just rescan if resize was in progress.
> >
> > I am not sure neigh_tbl_lock rwlock should be changed, I did not
> > see any contention on it.
> >
>
> Speaking about neighbour optimizations, here is a RFC patch which makes the
> tables double linked, for constant time deletion. It has given us a significant
> performance improvement - in less then usual setups though, with lots of
> neighbours.
>
> Would something like this be acceptable for upstream? (pardon the p4 diff dump
> :) - but I think it will give a rough idea, if acceptable will clean it up and
> properly submit it)
>
> BTW, would switching to list_head be better?
Use hlist for the neighbour table. It has the right properties
and makes future RCU easier.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-09-01 16:14 ` Eric Dumazet
@ 2009-09-01 16:56 ` Octavian Purdila
0 siblings, 0 replies; 9+ messages in thread
From: Octavian Purdila @ 2009-09-01 16:56 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev
On Tuesday 01 September 2009 19:14:37 Eric Dumazet wrote:
> Octavian Purdila a écrit :
> > On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
> >> Stephen Hemminger a écrit :
> >>> Looking at the neighbour table, it should be possible to get
> >>> rid of the two reader/writer locks. The hash table lock is pretty
> >>> amenable to RCU, but the dynamic resizing makes it non-trivial.
> >>> Thinking of using a combination of RCU and sequence counts so that the
> >>> reader would just rescan if resize was in progress.
> >>
> >> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> >> see any contention on it.
> >
> > Speaking about neighbour optimizations, here is a RFC patch which makes
> > the tables double linked, for constant time deletion. It has given us a
> > significant performance improvement - in less then usual setups though,
> > with lots of neighbours.
>
> How many "struct neigh_parms" do you have in your setups, and
> what is the frequency of neigh_parms_release() calls ???
>
Each L3 interface part (at least for IPv4/IPv6 and I see DecNet as well) has a
neigh_parms associated. And we use up to 128K interfaces in certain tests
scenarios ;)
In our case the interfaces are created/destroyed dynamically, so when using a
large number of interfaces the test cleanup takes forever without this (and
other) patch(es).
Thanks,
tavi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: neighbour table RCU
2009-09-01 16:13 ` Eric Dumazet
@ 2009-09-01 21:24 ` Stephen Hemminger
0 siblings, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2009-09-01 21:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On Tue, 01 Sep 2009 18:13:40 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Stephen Hemminger a écrit :
> > On Tue, 01 Sep 2009 08:50:17 +0200
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >> Stephen Hemminger a écrit :
> >>> Looking at the neighbour table, it should be possible to get
> >>> rid of the two reader/writer locks. The hash table lock is pretty
> >>> amenable to RCU, but the dynamic resizing makes it non-trivial.
> >>> Thinking of using a combination of RCU and sequence counts so that the
> >>> reader would just rescan if resize was in progress.
> >> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> >> see any contention on it.
> >>
> >>> The reader/writer lock on the neighbour entry is more of a problem.
> >>> Probably would be simpler/faster to change it into a spinlock and
> >>> be done with it.
> >>>
> >>> The reader/writer lock is also used for the proxy list hash table,
> >>> but that can just be a simple spinlock.
> >>>
> >> This is probably is the only thing we want to do at this moment,
> >> halving atomic ops on neigh_resolve_output()
> >>
> >> But why neigh_resolve_output() was called so much in the bench
> >> is the question...
> >>
> >
> > Every packet has to have an ARP resolution.
> >
>
> Sure, but I thought we had a cache ?
>
> static inline int ip_finish_output2(struct sk_buff *skb)
> {
> ...
> if (dst->hh)
> return neigh_hh_output(dst->hh, skb);
> else if (dst->neighbour)
> return dst->neighbour->output(skb); << should fill cache first time >>
> ...
> }
>
> in my pktgen benches, I always hit same dst so should take the hh cache ?
I ping the remote host before starting pktgen, that way I figure
the ARP and cache is available.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-09-01 21:24 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-31 22:04 neighbour table RCU Stephen Hemminger
2009-09-01 6:50 ` Eric Dumazet
2009-09-01 15:55 ` Octavian Purdila
2009-09-01 16:14 ` Eric Dumazet
2009-09-01 16:56 ` Octavian Purdila
2009-09-01 16:23 ` Stephen Hemminger
2009-09-01 15:59 ` Stephen Hemminger
2009-09-01 16:13 ` Eric Dumazet
2009-09-01 21:24 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).