netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
@ 2017-09-27 17:14 Paolo Abeni
  2017-09-27 17:44 ` Wei Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2017-09-27 17:14 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Hannes Frederic Sowa, Wei Wang

When a host is under high ipv6 load, the updates of the ingress
route '__use' field are a source of relevant contention: such
field is updated for each packet and several cores can access
concurrently the dst, even if percpu dst entries are available
and used.

The __use value is just a rough indication of the dst usage: is
already updated concurrently from multiple CPUs without any lock,
so we can decrease the contention leveraging the percpu dst to perform
__use bulk updates: if a per cpu dst entry is found, we account on
such entry and we flush the percpu counter once per jiffy.

Performace gain under UDP flood is as follows:

nr RX queues	before		after		delta
		kpps		kpps		(%)
2		2316		2688		16
3		3033		3605		18
4		3963		4328		9
5		4379		5253		19
6		5137		6000		16

Performance gain under TCP syn flood should be measurable as well.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/route.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 26cc9f483b6d..e69f304de950 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1170,12 +1170,24 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 
 		struct rt6_info *pcpu_rt;
 
-		rt->dst.lastuse = jiffies;
-		rt->dst.__use++;
 		pcpu_rt = rt6_get_pcpu_route(rt);
 
 		if (pcpu_rt) {
+			unsigned long ts;
+
 			read_unlock_bh(&table->tb6_lock);
+
+			/* do lazy updates of rt->dst->__use, at most once
+			 * per jiffy, to avoid contention on such cacheline.
+			 */
+			ts = jiffies;
+			pcpu_rt->dst.__use++;
+			if (pcpu_rt->dst.lastuse != ts) {
+				rt->dst.__use += pcpu_rt->dst.__use;
+				rt->dst.lastuse = ts;
+				pcpu_rt->dst.__use = 0;
+				pcpu_rt->dst.lastuse = ts;
+			}
 		} else {
 			/* We have to do the read_unlock first
 			 * because rt6_make_pcpu_route() may trigger
@@ -1185,6 +1197,8 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 			read_unlock_bh(&table->tb6_lock);
 			pcpu_rt = rt6_make_pcpu_route(rt);
 			dst_release(&rt->dst);
+			rt->dst.lastuse = jiffies;
+			rt->dst.__use++;
 		}
 
 		trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
  2017-09-27 17:14 [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available Paolo Abeni
@ 2017-09-27 17:44 ` Wei Wang
  2017-09-27 18:03   ` Eric Dumazet
  2017-09-27 18:25   ` Cong Wang
  0 siblings, 2 replies; 6+ messages in thread
From: Wei Wang @ 2017-09-27 17:44 UTC (permalink / raw)
  To: Paolo Abeni, Eric Dumazet, Martin KaFai Lau
  Cc: Linux Kernel Network Developers, David S. Miller,
	Hannes Frederic Sowa

On Wed, Sep 27, 2017 at 10:14 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> When a host is under high ipv6 load, the updates of the ingress
> route '__use' field are a source of relevant contention: such
> field is updated for each packet and several cores can access
> concurrently the dst, even if percpu dst entries are available
> and used.
>
> The __use value is just a rough indication of the dst usage: is
> already updated concurrently from multiple CPUs without any lock,
> so we can decrease the contention leveraging the percpu dst to perform
> __use bulk updates: if a per cpu dst entry is found, we account on
> such entry and we flush the percpu counter once per jiffy.
>
> Performace gain under UDP flood is as follows:
>
> nr RX queues    before          after           delta
>                 kpps            kpps            (%)
> 2               2316            2688            16
> 3               3033            3605            18
> 4               3963            4328            9
> 5               4379            5253            19
> 6               5137            6000            16
>
> Performance gain under TCP syn flood should be measurable as well.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv6/route.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 26cc9f483b6d..e69f304de950 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1170,12 +1170,24 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
>
>                 struct rt6_info *pcpu_rt;
>
> -               rt->dst.lastuse = jiffies;
> -               rt->dst.__use++;
>                 pcpu_rt = rt6_get_pcpu_route(rt);
>
>                 if (pcpu_rt) {
> +                       unsigned long ts;
> +
>                         read_unlock_bh(&table->tb6_lock);
> +
> +                       /* do lazy updates of rt->dst->__use, at most once
> +                        * per jiffy, to avoid contention on such cacheline.
> +                        */
> +                       ts = jiffies;
> +                       pcpu_rt->dst.__use++;
> +                       if (pcpu_rt->dst.lastuse != ts) {
> +                               rt->dst.__use += pcpu_rt->dst.__use;
> +                               rt->dst.lastuse = ts;
> +                               pcpu_rt->dst.__use = 0;
> +                               pcpu_rt->dst.lastuse = ts;
> +                       }
>                 } else {
>                         /* We have to do the read_unlock first
>                          * because rt6_make_pcpu_route() may trigger
> @@ -1185,6 +1197,8 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
>                         read_unlock_bh(&table->tb6_lock);
>                         pcpu_rt = rt6_make_pcpu_route(rt);
>                         dst_release(&rt->dst);
> +                       rt->dst.lastuse = jiffies;
> +                       rt->dst.__use++;
>                 }
>
>                 trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6);
> --
> 2.13.5
>

Hi Paolo,

Eric and I discussed about this issue recently as well :).

What about the following change:

diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a352..33e1d86bcef6 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst)
 static inline void dst_use(struct dst_entry *dst, unsigned long time)
 {
        dst_hold(dst);
-       dst->__use++;
-       dst->lastuse = time;
+       if (dst->lastuse != time) {
+               dst->__use++;
+               dst->lastuse = time;
+       }
 }

 static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
 {
-       dst->__use++;
-       dst->lastuse = time;
+       if (dst->lastuse != time) {
+               dst->__use++;
+               dst->lastuse = time;
+       }
 }

 static inline struct dst_entry *dst_clone(struct dst_entry *dst)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 26cc9f483b6d..e195f093add3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net,
struct fib6_table *table,

                struct rt6_info *pcpu_rt;

-               rt->dst.lastuse = jiffies;
-               rt->dst.__use++;
+               dst_use_noref(rt, jiffies);
                pcpu_rt = rt6_get_pcpu_route(rt);

                if (pcpu_rt) {


This way we always only update dst->__use and dst->lastuse at most
once per jiffy. And we don't really need to update pcpu and then do
the copy over from pcpu_rt to rt operation.

Another thing is that I don't really see any places making use of
dst->__use. So maybe we can also get rid of this dst->__use field?

Thanks.
Wei

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
  2017-09-27 17:44 ` Wei Wang
@ 2017-09-27 18:03   ` Eric Dumazet
  2017-09-27 19:01     ` Martin KaFai Lau
  2017-09-28 10:34     ` Paolo Abeni
  2017-09-27 18:25   ` Cong Wang
  1 sibling, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2017-09-27 18:03 UTC (permalink / raw)
  To: Wei Wang
  Cc: Paolo Abeni, Martin KaFai Lau, Linux Kernel Network Developers,
	David S. Miller, Hannes Frederic Sowa

>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c

>
> Hi Paolo,
>
> Eric and I discussed about this issue recently as well :).
>
> What about the following change:
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 93568bd0a352..33e1d86bcef6 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst)
>  static inline void dst_use(struct dst_entry *dst, unsigned long time)
>  {
>         dst_hold(dst);
> -       dst->__use++;
> -       dst->lastuse = time;
> +       if (dst->lastuse != time) {
> +               dst->__use++;
> +               dst->lastuse = time;
> +       }
>  }
>
>  static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
>  {
> -       dst->__use++;
> -       dst->lastuse = time;
> +       if (dst->lastuse != time) {
> +               dst->__use++;
> +               dst->lastuse = time;
> +       }
>  }
>
>  static inline struct dst_entry *dst_clone(struct dst_entry *dst)
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 26cc9f483b6d..e195f093add3 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net,
> struct fib6_table *table,
>
>                 struct rt6_info *pcpu_rt;
>
> -               rt->dst.lastuse = jiffies;
> -               rt->dst.__use++;
> +               dst_use_noref(rt, jiffies);
>                 pcpu_rt = rt6_get_pcpu_route(rt);
>
>                 if (pcpu_rt) {
>
>
> This way we always only update dst->__use and dst->lastuse at most
> once per jiffy. And we don't really need to update pcpu and then do
> the copy over from pcpu_rt to rt operation.
>
> Another thing is that I don't really see any places making use of
> dst->__use. So maybe we can also get rid of this dst->__use field?
>
> Thanks.
> Wei

Paolo, given we are very close to send Wei awesome work about IPv6
routing cache,
could we ask you to wait few days before doing the same work from your side ?

Main issue is the rwlock, and we are converting it to full RCU.

You are sending patches that are making our job very difficult IMO.

We chose to mimic the change done in neighbour code years ago
( 0ed8ddf4045fcfcac36bad753dc4046118c603ec )

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
  2017-09-27 17:44 ` Wei Wang
  2017-09-27 18:03   ` Eric Dumazet
@ 2017-09-27 18:25   ` Cong Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Cong Wang @ 2017-09-27 18:25 UTC (permalink / raw)
  To: Wei Wang
  Cc: Paolo Abeni, Eric Dumazet, Martin KaFai Lau,
	Linux Kernel Network Developers, David S. Miller,
	Hannes Frederic Sowa

On Wed, Sep 27, 2017 at 10:44 AM, Wei Wang <weiwan@google.com> wrote:
> Another thing is that I don't really see any places making use of
> dst->__use. So maybe we can also get rid of this dst->__use field?

It is used in rtnl_put_cacheinfo():

        struct rta_cacheinfo ci = {
                .rta_lastuse = jiffies_delta_to_clock_t(jiffies - dst->lastuse),
                .rta_used = dst->__use,
                .rta_clntref = atomic_read(&(dst->__refcnt)),
                .rta_error = error,
                .rta_id =  id,
        };

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
  2017-09-27 18:03   ` Eric Dumazet
@ 2017-09-27 19:01     ` Martin KaFai Lau
  2017-09-28 10:34     ` Paolo Abeni
  1 sibling, 0 replies; 6+ messages in thread
From: Martin KaFai Lau @ 2017-09-27 19:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Wang, Paolo Abeni, Linux Kernel Network Developers,
	David S. Miller, Hannes Frederic Sowa

On Wed, Sep 27, 2017 at 06:03:33PM +0000, Eric Dumazet wrote:
> >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> 
> >
> > Hi Paolo,
> >
> > Eric and I discussed about this issue recently as well :).
> >
> > What about the following change:
> >
> > diff --git a/include/net/dst.h b/include/net/dst.h
> > index 93568bd0a352..33e1d86bcef6 100644
> > --- a/include/net/dst.h
> > +++ b/include/net/dst.h
> > @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst)
> >  static inline void dst_use(struct dst_entry *dst, unsigned long time)
> >  {
> >         dst_hold(dst);
> > -       dst->__use++;
> > -       dst->lastuse = time;
> > +       if (dst->lastuse != time) {
> > +               dst->__use++;
> > +               dst->lastuse = time;
> > +       }
> >  }
> >
> >  static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
> >  {
> > -       dst->__use++;
> > -       dst->lastuse = time;
> > +       if (dst->lastuse != time) {
> > +               dst->__use++;
> > +               dst->lastuse = time;
> > +       }
> >  }
> >
> >  static inline struct dst_entry *dst_clone(struct dst_entry *dst)
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 26cc9f483b6d..e195f093add3 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net,
> > struct fib6_table *table,
> >
> >                 struct rt6_info *pcpu_rt;
> >
> > -               rt->dst.lastuse = jiffies;
> > -               rt->dst.__use++;
> > +               dst_use_noref(rt, jiffies);
> >                 pcpu_rt = rt6_get_pcpu_route(rt);
> >
> >                 if (pcpu_rt) {
> >
> >
> > This way we always only update dst->__use and dst->lastuse at most
> > once per jiffy. And we don't really need to update pcpu and then do
> > the copy over from pcpu_rt to rt operation.
> >
> > Another thing is that I don't really see any places making use of
> > dst->__use. So maybe we can also get rid of this dst->__use field?
> >
> > Thanks.
> > Wei
> 
> Paolo, given we are very close to send Wei awesome work about IPv6
> routing cache,
> could we ask you to wait few days before doing the same work from your side ?
> 
> Main issue is the rwlock, and we are converting it to full RCU.
+1

We can get a better picture of other optimizations once
the rwlock is removed.

> 
> You are sending patches that are making our job very difficult IMO.
> 
> We chose to mimic the change done in neighbour code years ago
> ( 0ed8ddf4045fcfcac36bad753dc4046118c603ec )
> 
> Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
  2017-09-27 18:03   ` Eric Dumazet
  2017-09-27 19:01     ` Martin KaFai Lau
@ 2017-09-28 10:34     ` Paolo Abeni
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Abeni @ 2017-09-28 10:34 UTC (permalink / raw)
  To: Eric Dumazet, Wei Wang
  Cc: Martin KaFai Lau, Linux Kernel Network Developers,
	David S. Miller, Hannes Frederic Sowa

On Wed, 2017-09-27 at 11:03 -0700, Eric Dumazet wrote:
> > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > 
> > Hi Paolo,
> > 
> > Eric and I discussed about this issue recently as well :).
> > 
> > What about the following change:
> > 
> > diff --git a/include/net/dst.h b/include/net/dst.h
> > index 93568bd0a352..33e1d86bcef6 100644
> > --- a/include/net/dst.h
> > +++ b/include/net/dst.h
> > @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst)
> >  static inline void dst_use(struct dst_entry *dst, unsigned long time)
> >  {
> >         dst_hold(dst);
> > -       dst->__use++;
> > -       dst->lastuse = time;
> > +       if (dst->lastuse != time) {
> > +               dst->__use++;
> > +               dst->lastuse = time;
> > +       }
> >  }
> > 
> >  static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
> >  {
> > -       dst->__use++;
> > -       dst->lastuse = time;
> > +       if (dst->lastuse != time) {
> > +               dst->__use++;
> > +               dst->lastuse = time;
> > +       }
> >  }
> > 
> >  static inline struct dst_entry *dst_clone(struct dst_entry *dst)
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 26cc9f483b6d..e195f093add3 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net,
> > struct fib6_table *table,
> > 
> >                 struct rt6_info *pcpu_rt;
> > 
> > -               rt->dst.lastuse = jiffies;
> > -               rt->dst.__use++;
> > +               dst_use_noref(rt, jiffies);
> >                 pcpu_rt = rt6_get_pcpu_route(rt);
> > 
> >                 if (pcpu_rt) {
> > 
> > 
> > This way we always only update dst->__use and dst->lastuse at most
> > once per jiffy. And we don't really need to update pcpu and then do
> > the copy over from pcpu_rt to rt operation.
> > 
> > Another thing is that I don't really see any places making use of
> > dst->__use. So maybe we can also get rid of this dst->__use field?
> > 
> > Thanks.
> > Wei
> 
> Paolo, given we are very close to send Wei awesome work about IPv6
> routing cache,
> could we ask you to wait few days before doing the same work from your side ?

Ok, no problem - thanks instead. I'll wait for it. 

> Main issue is the rwlock, and we are converting it to full RCU.
> 
> You are sending patches that are making our job very difficult IMO.

On my side I have only another small change in this area, I'll
eventually try to rebase it later, if still relevant.

Or I can share it now, if you are interested.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-28 10:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-27 17:14 [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available Paolo Abeni
2017-09-27 17:44 ` Wei Wang
2017-09-27 18:03   ` Eric Dumazet
2017-09-27 19:01     ` Martin KaFai Lau
2017-09-28 10:34     ` Paolo Abeni
2017-09-27 18:25   ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).