netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* neighbour table RCU
@ 2009-08-31 22:04 Stephen Hemminger
  2009-09-01  6:50 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2009-08-31 22:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Looking at the neighbour table, it should be possible to get
rid of the two reader/writer locks.  The hash table lock is pretty
amenable to RCU, but the dynamic resizing makes it non-trivial.
Thinking of using a combination of RCU and sequence counts so that the
reader would just rescan if resize was in progress.

The reader/writer lock on the neighbour entry is more of a problem.
Probably would be simpler/faster to change it into a spinlock and
be done with it.

The reader/writer lock is also used for the proxy list hash table,
but that can just be a simple spinlock.

-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-08-31 22:04 neighbour table RCU Stephen Hemminger
@ 2009-09-01  6:50 ` Eric Dumazet
  2009-09-01 15:55   ` Octavian Purdila
  2009-09-01 15:59   ` Stephen Hemminger
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2009-09-01  6:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Stephen Hemminger a écrit :
> Looking at the neighbour table, it should be possible to get
> rid of the two reader/writer locks.  The hash table lock is pretty
> amenable to RCU, but the dynamic resizing makes it non-trivial.
> Thinking of using a combination of RCU and sequence counts so that the
> reader would just rescan if resize was in progress.

I am not sure neigh_tbl_lock rwlock should be changed, I did not
see any contention on it.

> 
> The reader/writer lock on the neighbour entry is more of a problem.
> Probably would be simpler/faster to change it into a spinlock and
> be done with it.
> 
> The reader/writer lock is also used for the proxy list hash table,
> but that can just be a simple spinlock.
> 

This is probably is the only thing we want to do at this moment,
halving atomic ops on neigh_resolve_output()

But why neigh_resolve_output() was called so much in the bench
is the question...


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-09-01  6:50 ` Eric Dumazet
@ 2009-09-01 15:55   ` Octavian Purdila
  2009-09-01 16:14     ` Eric Dumazet
  2009-09-01 16:23     ` Stephen Hemminger
  2009-09-01 15:59   ` Stephen Hemminger
  1 sibling, 2 replies; 9+ messages in thread
From: Octavian Purdila @ 2009-09-01 15:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev

On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
> Stephen Hemminger a écrit :
> > Looking at the neighbour table, it should be possible to get
> > rid of the two reader/writer locks.  The hash table lock is pretty
> > amenable to RCU, but the dynamic resizing makes it non-trivial.
> > Thinking of using a combination of RCU and sequence counts so that the
> > reader would just rescan if resize was in progress.
>
> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> see any contention on it.
>

Speaking about neighbour optimizations, here is a RFC patch which makes the 
tables double linked, for constant time deletion. It has given us a significant 
performance improvement - in less then usual setups though, with lots of 
neighbours.

Would something like this be acceptable for upstream? (pardon the p4 diff dump 
:) - but I think it will give a rough idea, if acceptable will clean it up and 
properly submit it)

BTW, would switching to list_head be better?

Thanks,
tavi

==== //packages/linux-2.6.7/main/src/include/net/neighbour.h#2 (text) ====

@@ -56,6 +56,7 @@
 struct neigh_parms
 {
        struct neigh_parms *next;
+       struct neigh_parms **pprev;
        int     (*neigh_setup)(struct neighbour *);
        struct neigh_table *tbl;
        int     entries;

==== //packages/linux-2.6.7/main/src/net/core/neighbour.c#3 (text) ====

@@ -1127,8 +1127,10 @@
                }
                p->sysctl_table = NULL;
                write_lock_bh(&tbl->lock);
-               p->next         = tbl->parms.next;
+               if ((p->next = tbl->parms.next))
+                       p->next->pprev = &p->next;
                tbl->parms.next = p;
+               p->pprev = &tbl->parms.next;
                write_unlock_bh(&tbl->lock);
        }
        return p;
@@ -1136,21 +1138,14 @@
 
 void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 {
-       struct neigh_parms **p;
-
        if (!parms || parms == &tbl->parms)
                return;
        write_lock_bh(&tbl->lock);
-       for (p = &tbl->parms.next; *p; p = &(*p)->next) {
-               if (*p == parms) {
-                       *p = parms->next;
-                       write_unlock_bh(&tbl->lock);
-                       kfree(parms);
-                       return;
-               }
-       }
+       if ((*parms->pprev = parms->next))
+               parms->next->pprev = parms->pprev;
        write_unlock_bh(&tbl->lock);
-       NEIGH_PRINTK1("neigh_parms_release: not found\n");
+       kfree(parms);
+       return;
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-09-01  6:50 ` Eric Dumazet
  2009-09-01 15:55   ` Octavian Purdila
@ 2009-09-01 15:59   ` Stephen Hemminger
  2009-09-01 16:13     ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2009-09-01 15:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Tue, 01 Sep 2009 08:50:17 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > Looking at the neighbour table, it should be possible to get
> > rid of the two reader/writer locks.  The hash table lock is pretty
> > amenable to RCU, but the dynamic resizing makes it non-trivial.
> > Thinking of using a combination of RCU and sequence counts so that the
> > reader would just rescan if resize was in progress.
> 
> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> see any contention on it.
> 
> > 
> > The reader/writer lock on the neighbour entry is more of a problem.
> > Probably would be simpler/faster to change it into a spinlock and
> > be done with it.
> > 
> > The reader/writer lock is also used for the proxy list hash table,
> > but that can just be a simple spinlock.
> > 
> 
> This is probably is the only thing we want to do at this moment,
> halving atomic ops on neigh_resolve_output()
> 
> But why neigh_resolve_output() was called so much in the bench
> is the question...
> 

Every packet has to have an ARP resolution.

-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-09-01 15:59   ` Stephen Hemminger
@ 2009-09-01 16:13     ` Eric Dumazet
  2009-09-01 21:24       ` Stephen Hemminger
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-09-01 16:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Stephen Hemminger a écrit :
> On Tue, 01 Sep 2009 08:50:17 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Stephen Hemminger a écrit :
>>> Looking at the neighbour table, it should be possible to get
>>> rid of the two reader/writer locks.  The hash table lock is pretty
>>> amenable to RCU, but the dynamic resizing makes it non-trivial.
>>> Thinking of using a combination of RCU and sequence counts so that the
>>> reader would just rescan if resize was in progress.
>> I am not sure neigh_tbl_lock rwlock should be changed, I did not
>> see any contention on it.
>>
>>> The reader/writer lock on the neighbour entry is more of a problem.
>>> Probably would be simpler/faster to change it into a spinlock and
>>> be done with it.
>>>
>>> The reader/writer lock is also used for the proxy list hash table,
>>> but that can just be a simple spinlock.
>>>
>> This is probably is the only thing we want to do at this moment,
>> halving atomic ops on neigh_resolve_output()
>>
>> But why neigh_resolve_output() was called so much in the bench
>> is the question...
>>
> 
> Every packet has to have an ARP resolution.
> 

Sure, but I thought we had a cache ?

static inline int ip_finish_output2(struct sk_buff *skb)
{
...
	if (dst->hh)
		return neigh_hh_output(dst->hh, skb);
	else if (dst->neighbour)
		return dst->neighbour->output(skb);  << should fill cache first time >>
...
}

in my pktgen benches, I always hit same dst so should take the hh cache ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-09-01 15:55   ` Octavian Purdila
@ 2009-09-01 16:14     ` Eric Dumazet
  2009-09-01 16:56       ` Octavian Purdila
  2009-09-01 16:23     ` Stephen Hemminger
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2009-09-01 16:14 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev

Octavian Purdila a écrit :
> On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
>> Stephen Hemminger a écrit :
>>> Looking at the neighbour table, it should be possible to get
>>> rid of the two reader/writer locks.  The hash table lock is pretty
>>> amenable to RCU, but the dynamic resizing makes it non-trivial.
>>> Thinking of using a combination of RCU and sequence counts so that the
>>> reader would just rescan if resize was in progress.
>> I am not sure neigh_tbl_lock rwlock should be changed, I did not
>> see any contention on it.
>>
> 
> Speaking about neighbour optimizations, here is a RFC patch which makes the 
> tables double linked, for constant time deletion. It has given us a significant 
> performance improvement - in less then usual setups though, with lots of 
> neighbours.

How many "struct neigh_parms" do you have in your setups, and
what is the frequency of neigh_parms_release() calls ???

> 
> Would something like this be acceptable for upstream? (pardon the p4 diff dump 
> :) - but I think it will give a rough idea, if acceptable will clean it up and 
> properly submit it)

Seems straightforward

> 
> BTW, would switching to list_head be better?

Yes, definitly :)

> 
> Thanks,
> tavi
> 
> ==== //packages/linux-2.6.7/main/src/include/net/neighbour.h#2 (text) ====
> 
> @@ -56,6 +56,7 @@
>  struct neigh_parms
>  {
>         struct neigh_parms *next;
> +       struct neigh_parms **pprev;
>         int     (*neigh_setup)(struct neighbour *);
>         struct neigh_table *tbl;
>         int     entries;
> 
> ==== //packages/linux-2.6.7/main/src/net/core/neighbour.c#3 (text) ====
> 
> @@ -1127,8 +1127,10 @@
>                 }
>                 p->sysctl_table = NULL;
>                 write_lock_bh(&tbl->lock);
> -               p->next         = tbl->parms.next;
> +               if ((p->next = tbl->parms.next))
> +                       p->next->pprev = &p->next;
>                 tbl->parms.next = p;
> +               p->pprev = &tbl->parms.next;
>                 write_unlock_bh(&tbl->lock);
>         }
>         return p;
> @@ -1136,21 +1138,14 @@
>  
>  void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
>  {
> -       struct neigh_parms **p;
> -
>         if (!parms || parms == &tbl->parms)
>                 return;
>         write_lock_bh(&tbl->lock);
> -       for (p = &tbl->parms.next; *p; p = &(*p)->next) {
> -               if (*p == parms) {
> -                       *p = parms->next;
> -                       write_unlock_bh(&tbl->lock);
> -                       kfree(parms);
> -                       return;
> -               }
> -       }
> +       if ((*parms->pprev = parms->next))
> +               parms->next->pprev = parms->pprev;
>         write_unlock_bh(&tbl->lock);
> -       NEIGH_PRINTK1("neigh_parms_release: not found\n");
> +       kfree(parms);
> +       return;
>  }
> --

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-09-01 15:55   ` Octavian Purdila
  2009-09-01 16:14     ` Eric Dumazet
@ 2009-09-01 16:23     ` Stephen Hemminger
  1 sibling, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2009-09-01 16:23 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: Eric Dumazet, Lucian Adrian Grijincu, netdev

On Tue, 1 Sep 2009 18:55:34 +0300
Octavian Purdila <opurdila@ixiacom.com> wrote:

> On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
> > Stephen Hemminger a écrit :
> > > Looking at the neighbour table, it should be possible to get
> > > rid of the two reader/writer locks.  The hash table lock is pretty
> > > amenable to RCU, but the dynamic resizing makes it non-trivial.
> > > Thinking of using a combination of RCU and sequence counts so that the
> > > reader would just rescan if resize was in progress.
> >
> > I am not sure neigh_tbl_lock rwlock should be changed, I did not
> > see any contention on it.
> >
> 
> Speaking about neighbour optimizations, here is a RFC patch which makes the 
> tables double linked, for constant time deletion. It has given us a significant 
> performance improvement - in less then usual setups though, with lots of 
> neighbours.
> 
> Would something like this be acceptable for upstream? (pardon the p4 diff dump 
> :) - but I think it will give a rough idea, if acceptable will clean it up and 
> properly submit it)
> 
> BTW, would switching to list_head be better?

Use hlist for the neighbour table. It has the right properties
and makes future RCU easier.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-09-01 16:14     ` Eric Dumazet
@ 2009-09-01 16:56       ` Octavian Purdila
  0 siblings, 0 replies; 9+ messages in thread
From: Octavian Purdila @ 2009-09-01 16:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Lucian Adrian Grijincu, netdev

On Tuesday 01 September 2009 19:14:37 Eric Dumazet wrote:
> Octavian Purdila a écrit :
> > On Tuesday 01 September 2009 09:50:17 Eric Dumazet wrote:
> >> Stephen Hemminger a écrit :
> >>> Looking at the neighbour table, it should be possible to get
> >>> rid of the two reader/writer locks.  The hash table lock is pretty
> >>> amenable to RCU, but the dynamic resizing makes it non-trivial.
> >>> Thinking of using a combination of RCU and sequence counts so that the
> >>> reader would just rescan if resize was in progress.
> >>
> >> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> >> see any contention on it.
> >
> > Speaking about neighbour optimizations, here is a RFC patch which makes
> > the tables double linked, for constant time deletion. It has given us a
> > significant performance improvement - in less then usual setups though,
> > with lots of neighbours.
>
> How many "struct neigh_parms" do you have in your setups, and
> what is the frequency of neigh_parms_release() calls ???
>

Each L3 interface part (at least for IPv4/IPv6 and I see DecNet as well) has a 
neigh_parms associated. And we use up to 128K interfaces in certain tests 
scenarios ;)

In our case the interfaces are created/destroyed dynamically, so when using a 
large number of interfaces the test cleanup takes forever without this (and 
other) patch(es).

Thanks,
tavi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: neighbour table RCU
  2009-09-01 16:13     ` Eric Dumazet
@ 2009-09-01 21:24       ` Stephen Hemminger
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2009-09-01 21:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Tue, 01 Sep 2009 18:13:40 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > On Tue, 01 Sep 2009 08:50:17 +0200
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> >> Stephen Hemminger a écrit :
> >>> Looking at the neighbour table, it should be possible to get
> >>> rid of the two reader/writer locks.  The hash table lock is pretty
> >>> amenable to RCU, but the dynamic resizing makes it non-trivial.
> >>> Thinking of using a combination of RCU and sequence counts so that the
> >>> reader would just rescan if resize was in progress.
> >> I am not sure neigh_tbl_lock rwlock should be changed, I did not
> >> see any contention on it.
> >>
> >>> The reader/writer lock on the neighbour entry is more of a problem.
> >>> Probably would be simpler/faster to change it into a spinlock and
> >>> be done with it.
> >>>
> >>> The reader/writer lock is also used for the proxy list hash table,
> >>> but that can just be a simple spinlock.
> >>>
> >> This is probably is the only thing we want to do at this moment,
> >> halving atomic ops on neigh_resolve_output()
> >>
> >> But why neigh_resolve_output() was called so much in the bench
> >> is the question...
> >>
> > 
> > Every packet has to have an ARP resolution.
> > 
> 
> Sure, but I thought we had a cache ?
> 
> static inline int ip_finish_output2(struct sk_buff *skb)
> {
> ...
> 	if (dst->hh)
> 		return neigh_hh_output(dst->hh, skb);
> 	else if (dst->neighbour)
> 		return dst->neighbour->output(skb);  << should fill cache first time >>
> ...
> }
> 
> in my pktgen benches, I always hit same dst so should take the hh cache ?

I ping the remote host before starting pktgen, that way I figure
the ARP and cache is available.

-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-09-01 21:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-31 22:04 neighbour table RCU Stephen Hemminger
2009-09-01  6:50 ` Eric Dumazet
2009-09-01 15:55   ` Octavian Purdila
2009-09-01 16:14     ` Eric Dumazet
2009-09-01 16:56       ` Octavian Purdila
2009-09-01 16:23     ` Stephen Hemminger
2009-09-01 15:59   ` Stephen Hemminger
2009-09-01 16:13     ` Eric Dumazet
2009-09-01 21:24       ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).