* [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available @ 2017-09-27 17:14 Paolo Abeni 2017-09-27 17:44 ` Wei Wang 0 siblings, 1 reply; 6+ messages in thread From: Paolo Abeni @ 2017-09-27 17:14 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Hannes Frederic Sowa, Wei Wang When a host is under high ipv6 load, the updates of the ingress route '__use' field are a source of relevant contention: such field is updated for each packet and several cores can access concurrently the dst, even if percpu dst entries are available and used. The __use value is just a rough indication of the dst usage: is already updated concurrently from multiple CPUs without any lock, so we can decrease the contention leveraging the percpu dst to perform __use bulk updates: if a per cpu dst entry is found, we account on such entry and we flush the percpu counter once per jiffy. Performace gain under UDP flood is as follows: nr RX queues before after delta kpps kpps (%) 2 2316 2688 16 3 3033 3605 18 4 3963 4328 9 5 4379 5253 19 6 5137 6000 16 Performance gain under TCP syn flood should be measurable as well. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/ipv6/route.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 26cc9f483b6d..e69f304de950 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1170,12 +1170,24 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, struct rt6_info *pcpu_rt; - rt->dst.lastuse = jiffies; - rt->dst.__use++; pcpu_rt = rt6_get_pcpu_route(rt); if (pcpu_rt) { + unsigned long ts; + read_unlock_bh(&table->tb6_lock); + + /* do lazy updates of rt->dst->__use, at most once + * per jiffy, to avoid contention on such cacheline. + */ + ts = jiffies; + pcpu_rt->dst.__use++; + if (pcpu_rt->dst.lastuse != ts) { + rt->dst.__use += pcpu_rt->dst.__use; + rt->dst.lastuse = ts; + pcpu_rt->dst.__use = 0; + pcpu_rt->dst.lastuse = ts; + } } else { /* We have to do the read_unlock first * because rt6_make_pcpu_route() may trigger @@ -1185,6 +1197,8 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, read_unlock_bh(&table->tb6_lock); pcpu_rt = rt6_make_pcpu_route(rt); dst_release(&rt->dst); + rt->dst.lastuse = jiffies; + rt->dst.__use++; } trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6); -- 2.13.5 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available 2017-09-27 17:14 [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available Paolo Abeni @ 2017-09-27 17:44 ` Wei Wang 2017-09-27 18:03 ` Eric Dumazet 2017-09-27 18:25 ` Cong Wang 0 siblings, 2 replies; 6+ messages in thread From: Wei Wang @ 2017-09-27 17:44 UTC (permalink / raw) To: Paolo Abeni, Eric Dumazet, Martin KaFai Lau Cc: Linux Kernel Network Developers, David S. Miller, Hannes Frederic Sowa On Wed, Sep 27, 2017 at 10:14 AM, Paolo Abeni <pabeni@redhat.com> wrote: > When a host is under high ipv6 load, the updates of the ingress > route '__use' field are a source of relevant contention: such > field is updated for each packet and several cores can access > concurrently the dst, even if percpu dst entries are available > and used. > > The __use value is just a rough indication of the dst usage: is > already updated concurrently from multiple CPUs without any lock, > so we can decrease the contention leveraging the percpu dst to perform > __use bulk updates: if a per cpu dst entry is found, we account on > such entry and we flush the percpu counter once per jiffy. > > Performace gain under UDP flood is as follows: > > nr RX queues before after delta > kpps kpps (%) > 2 2316 2688 16 > 3 3033 3605 18 > 4 3963 4328 9 > 5 4379 5253 19 > 6 5137 6000 16 > > Performance gain under TCP syn flood should be measurable as well. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/ipv6/route.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 26cc9f483b6d..e69f304de950 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1170,12 +1170,24 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, > > struct rt6_info *pcpu_rt; > > - rt->dst.lastuse = jiffies; > - rt->dst.__use++; > pcpu_rt = rt6_get_pcpu_route(rt); > > if (pcpu_rt) { > + unsigned long ts; > + > read_unlock_bh(&table->tb6_lock); > + > + /* do lazy updates of rt->dst->__use, at most once > + * per jiffy, to avoid contention on such cacheline. > + */ > + ts = jiffies; > + pcpu_rt->dst.__use++; > + if (pcpu_rt->dst.lastuse != ts) { > + rt->dst.__use += pcpu_rt->dst.__use; > + rt->dst.lastuse = ts; > + pcpu_rt->dst.__use = 0; > + pcpu_rt->dst.lastuse = ts; > + } > } else { > /* We have to do the read_unlock first > * because rt6_make_pcpu_route() may trigger > @@ -1185,6 +1197,8 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, > read_unlock_bh(&table->tb6_lock); > pcpu_rt = rt6_make_pcpu_route(rt); > dst_release(&rt->dst); > + rt->dst.lastuse = jiffies; > + rt->dst.__use++; > } > > trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6); > -- > 2.13.5 > Hi Paolo, Eric and I discussed about this issue recently as well :). What about the following change: diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a352..33e1d86bcef6 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst) static inline void dst_use(struct dst_entry *dst, unsigned long time) { dst_hold(dst); - dst->__use++; - dst->lastuse = time; + if (dst->lastuse != time) { + dst->__use++; + dst->lastuse = time; + } } static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) { - dst->__use++; - dst->lastuse = time; + if (dst->lastuse != time) { + dst->__use++; + dst->lastuse = time; + } } static inline struct dst_entry *dst_clone(struct dst_entry *dst) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 26cc9f483b6d..e195f093add3 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, struct rt6_info *pcpu_rt; - rt->dst.lastuse = jiffies; - rt->dst.__use++; + dst_use_noref(rt, jiffies); pcpu_rt = rt6_get_pcpu_route(rt); if (pcpu_rt) { This way we always only update dst->__use and dst->lastuse at most once per jiffy. And we don't really need to update pcpu and then do the copy over from pcpu_rt to rt operation. Another thing is that I don't really see any places making use of dst->__use. So maybe we can also get rid of this dst->__use field? Thanks. Wei ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available 2017-09-27 17:44 ` Wei Wang @ 2017-09-27 18:03 ` Eric Dumazet 2017-09-27 19:01 ` Martin KaFai Lau 2017-09-28 10:34 ` Paolo Abeni 2017-09-27 18:25 ` Cong Wang 1 sibling, 2 replies; 6+ messages in thread From: Eric Dumazet @ 2017-09-27 18:03 UTC (permalink / raw) To: Wei Wang Cc: Paolo Abeni, Martin KaFai Lau, Linux Kernel Network Developers, David S. Miller, Hannes Frederic Sowa >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > Hi Paolo, > > Eric and I discussed about this issue recently as well :). > > What about the following change: > > diff --git a/include/net/dst.h b/include/net/dst.h > index 93568bd0a352..33e1d86bcef6 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst) > static inline void dst_use(struct dst_entry *dst, unsigned long time) > { > dst_hold(dst); > - dst->__use++; > - dst->lastuse = time; > + if (dst->lastuse != time) { > + dst->__use++; > + dst->lastuse = time; > + } > } > > static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) > { > - dst->__use++; > - dst->lastuse = time; > + if (dst->lastuse != time) { > + dst->__use++; > + dst->lastuse = time; > + } > } > > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 26cc9f483b6d..e195f093add3 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net, > struct fib6_table *table, > > struct rt6_info *pcpu_rt; > > - rt->dst.lastuse = jiffies; > - rt->dst.__use++; > + dst_use_noref(rt, jiffies); > pcpu_rt = rt6_get_pcpu_route(rt); > > if (pcpu_rt) { > > > This way we always only update dst->__use and dst->lastuse at most > once per jiffy. And we don't really need to update pcpu and then do > the copy over from pcpu_rt to rt operation. > > Another thing is that I don't really see any places making use of > dst->__use. So maybe we can also get rid of this dst->__use field? > > Thanks. > Wei Paolo, given we are very close to send Wei awesome work about IPv6 routing cache, could we ask you to wait few days before doing the same work from your side ? Main issue is the rwlock, and we are converting it to full RCU. You are sending patches that are making our job very difficult IMO. We chose to mimic the change done in neighbour code years ago ( 0ed8ddf4045fcfcac36bad753dc4046118c603ec ) Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available 2017-09-27 18:03 ` Eric Dumazet @ 2017-09-27 19:01 ` Martin KaFai Lau 2017-09-28 10:34 ` Paolo Abeni 1 sibling, 0 replies; 6+ messages in thread From: Martin KaFai Lau @ 2017-09-27 19:01 UTC (permalink / raw) To: Eric Dumazet Cc: Wei Wang, Paolo Abeni, Linux Kernel Network Developers, David S. Miller, Hannes Frederic Sowa On Wed, Sep 27, 2017 at 06:03:33PM +0000, Eric Dumazet wrote: > >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > > > > Hi Paolo, > > > > Eric and I discussed about this issue recently as well :). > > > > What about the following change: > > > > diff --git a/include/net/dst.h b/include/net/dst.h > > index 93568bd0a352..33e1d86bcef6 100644 > > --- a/include/net/dst.h > > +++ b/include/net/dst.h > > @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst) > > static inline void dst_use(struct dst_entry *dst, unsigned long time) > > { > > dst_hold(dst); > > - dst->__use++; > > - dst->lastuse = time; > > + if (dst->lastuse != time) { > > + dst->__use++; > > + dst->lastuse = time; > > + } > > } > > > > static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) > > { > > - dst->__use++; > > - dst->lastuse = time; > > + if (dst->lastuse != time) { > > + dst->__use++; > > + dst->lastuse = time; > > + } > > } > > > > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 26cc9f483b6d..e195f093add3 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net, > > struct fib6_table *table, > > > > struct rt6_info *pcpu_rt; > > > > - rt->dst.lastuse = jiffies; > > - rt->dst.__use++; > > + dst_use_noref(rt, jiffies); > > pcpu_rt = rt6_get_pcpu_route(rt); > > > > if (pcpu_rt) { > > > > > > This way we always only update dst->__use and dst->lastuse at most > > once per jiffy. And we don't really need to update pcpu and then do > > the copy over from pcpu_rt to rt operation. > > > > Another thing is that I don't really see any places making use of > > dst->__use. So maybe we can also get rid of this dst->__use field? > > > > Thanks. > > Wei > > Paolo, given we are very close to send Wei awesome work about IPv6 > routing cache, > could we ask you to wait few days before doing the same work from your side ? > > Main issue is the rwlock, and we are converting it to full RCU. +1 We can get a better picture of other optimizations once the rwlock is removed. > > You are sending patches that are making our job very difficult IMO. > > We chose to mimic the change done in neighbour code years ago > ( 0ed8ddf4045fcfcac36bad753dc4046118c603ec ) > > Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available 2017-09-27 18:03 ` Eric Dumazet 2017-09-27 19:01 ` Martin KaFai Lau @ 2017-09-28 10:34 ` Paolo Abeni 1 sibling, 0 replies; 6+ messages in thread From: Paolo Abeni @ 2017-09-28 10:34 UTC (permalink / raw) To: Eric Dumazet, Wei Wang Cc: Martin KaFai Lau, Linux Kernel Network Developers, David S. Miller, Hannes Frederic Sowa On Wed, 2017-09-27 at 11:03 -0700, Eric Dumazet wrote: > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > > > Hi Paolo, > > > > Eric and I discussed about this issue recently as well :). > > > > What about the following change: > > > > diff --git a/include/net/dst.h b/include/net/dst.h > > index 93568bd0a352..33e1d86bcef6 100644 > > --- a/include/net/dst.h > > +++ b/include/net/dst.h > > @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst) > > static inline void dst_use(struct dst_entry *dst, unsigned long time) > > { > > dst_hold(dst); > > - dst->__use++; > > - dst->lastuse = time; > > + if (dst->lastuse != time) { > > + dst->__use++; > > + dst->lastuse = time; > > + } > > } > > > > static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) > > { > > - dst->__use++; > > - dst->lastuse = time; > > + if (dst->lastuse != time) { > > + dst->__use++; > > + dst->lastuse = time; > > + } > > } > > > > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 26cc9f483b6d..e195f093add3 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net, > > struct fib6_table *table, > > > > struct rt6_info *pcpu_rt; > > > > - rt->dst.lastuse = jiffies; > > - rt->dst.__use++; > > + dst_use_noref(rt, jiffies); > > pcpu_rt = rt6_get_pcpu_route(rt); > > > > if (pcpu_rt) { > > > > > > This way we always only update dst->__use and dst->lastuse at most > > once per jiffy. And we don't really need to update pcpu and then do > > the copy over from pcpu_rt to rt operation. > > > > Another thing is that I don't really see any places making use of > > dst->__use. So maybe we can also get rid of this dst->__use field? > > > > Thanks. > > Wei > > Paolo, given we are very close to send Wei awesome work about IPv6 > routing cache, > could we ask you to wait few days before doing the same work from your side ? Ok, no problem - thanks instead. I'll wait for it. > Main issue is the rwlock, and we are converting it to full RCU. > > You are sending patches that are making our job very difficult IMO. On my side I have only another small change in this area, I'll eventually try to rebase it later, if still relevant. Or I can share it now, if you are interested. Cheers, Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available 2017-09-27 17:44 ` Wei Wang 2017-09-27 18:03 ` Eric Dumazet @ 2017-09-27 18:25 ` Cong Wang 1 sibling, 0 replies; 6+ messages in thread From: Cong Wang @ 2017-09-27 18:25 UTC (permalink / raw) To: Wei Wang Cc: Paolo Abeni, Eric Dumazet, Martin KaFai Lau, Linux Kernel Network Developers, David S. Miller, Hannes Frederic Sowa On Wed, Sep 27, 2017 at 10:44 AM, Wei Wang <weiwan@google.com> wrote: > Another thing is that I don't really see any places making use of > dst->__use. So maybe we can also get rid of this dst->__use field? It is used in rtnl_put_cacheinfo(): struct rta_cacheinfo ci = { .rta_lastuse = jiffies_delta_to_clock_t(jiffies - dst->lastuse), .rta_used = dst->__use, .rta_clntref = atomic_read(&(dst->__refcnt)), .rta_error = error, .rta_id = id, }; ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-09-28 10:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-27 17:14 [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available Paolo Abeni 2017-09-27 17:44 ` Wei Wang 2017-09-27 18:03 ` Eric Dumazet 2017-09-27 19:01 ` Martin KaFai Lau 2017-09-28 10:34 ` Paolo Abeni 2017-09-27 18:25 ` Cong Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).