* [GIT PULL] IPVS updates for v3.13 @ 2013-10-15 2:01 Simon Horman 2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov, Simon Horman Hi Pablo, please consider the following fixes for IPVS for v3.13. This pull request is based on nf-next. The following changes since commit 58308451e91974267e1f4a618346055342019e02: Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next (2013-10-10 15:29:44 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v3.13 for you to fetch changes up to 1255ce5f10dbb4646c8d43b8d59faab48ae4a6b2: ipvs: improved SH fallback strategy (2013-10-15 10:54:50 +0900) ---------------------------------------------------------------- IPVS updates for v3.13 * Improvements to SH fallback strategy * Avoid rcu_barrier during netns cleanup * Fix the IPVS_CMD_ATTR_MAX definition ---------------------------------------------------------------- Alexander Frolkin (1): ipvs: improved SH fallback strategy Julian Anastasov (2): ipvs: fix the IPVS_CMD_ATTR_MAX definition ipvs: avoid rcu_barrier during netns cleanup include/net/ip_vs.h | 6 ++++++ include/uapi/linux/ip_vs.h | 2 +- net/netfilter/ipvs/ip_vs_ctl.c | 6 +----- net/netfilter/ipvs/ip_vs_lblc.c | 2 +- net/netfilter/ipvs/ip_vs_lblcr.c | 2 +- net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++---------- 6 files changed, 39 insertions(+), 18 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman @ 2013-10-15 2:01 ` Simon Horman 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman 2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman 2 siblings, 0 replies; 9+ messages in thread From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov, Simon Horman From: Julian Anastasov <ja@ssi.bg> It was wrong (bigger) but problem is harmless. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> --- include/uapi/linux/ip_vs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h index 2945822..fbcffe8 100644 --- a/include/uapi/linux/ip_vs.h +++ b/include/uapi/linux/ip_vs.h @@ -334,7 +334,7 @@ enum { __IPVS_CMD_ATTR_MAX, }; -#define IPVS_CMD_ATTR_MAX (__IPVS_SVC_ATTR_MAX - 1) +#define IPVS_CMD_ATTR_MAX (__IPVS_CMD_ATTR_MAX - 1) /* * Attributes used to describe a service -- 1.8.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman 2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman @ 2013-10-15 2:01 ` Simon Horman 2013-10-16 10:43 ` Pablo Neira Ayuso 2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman 2 siblings, 1 reply; 9+ messages in thread From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov, Simon Horman From: Julian Anastasov <ja@ssi.bg> commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added rcu_barrier() on cleanup to wait dest users and schedulers like LBLC and LBLCR to put their last dest reference. Using rcu_barrier with many namespaces is problematic. Trying to fix it by freeing dest with kfree_rcu is not a solution, RCU callbacks can run in parallel and execution order is random. Fix it by creating new function ip_vs_dest_put_and_free() which is heavier than ip_vs_dest_put(). We will use it just for schedulers like LBLC, LBLCR that can delay their dest release. By default, dests reference is above 0 if they are present in service and it is 0 when deleted but still in trash list. Change the dest trash code to use ip_vs_dest_put_and_free(), so that refcnt -1 can be used for freeing. As result, such checks remain in slow path and the rcu_barrier() from netns cleanup can be removed. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> --- include/net/ip_vs.h | 6 ++++++ net/netfilter/ipvs/ip_vs_ctl.c | 6 +----- net/netfilter/ipvs/ip_vs_lblc.c | 2 +- net/netfilter/ipvs/ip_vs_lblcr.c | 2 +- 4 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 1c2e1b9..cd7275f 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -1442,6 +1442,12 @@ static inline void ip_vs_dest_put(struct ip_vs_dest *dest) atomic_dec(&dest->refcnt); } +static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest) +{ + if (atomic_dec_return(&dest->refcnt) < 0) + kfree(dest); +} + /* * IPVS sync daemon data and function prototypes * (from ip_vs_sync.c) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index a3df9bd..62786a4 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -704,7 +704,7 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest) __ip_vs_dst_cache_reset(dest); __ip_vs_svc_put(svc, false); free_percpu(dest->stats.cpustats); - kfree(dest); + ip_vs_dest_put_and_free(dest); } /* @@ -3820,10 +3820,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net) { struct netns_ipvs *ipvs = net_ipvs(net); - /* Some dest can be in grace period even before cleanup, we have to - * defer ip_vs_trash_cleanup until ip_vs_dest_wait_readers is called. - */ - rcu_barrier(); ip_vs_trash_cleanup(net); ip_vs_stop_estimator(net, &ipvs->tot_stats); ip_vs_control_net_cleanup_sysctl(net); diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c index eff13c9..ca056a3 100644 --- a/net/netfilter/ipvs/ip_vs_lblc.c +++ b/net/netfilter/ipvs/ip_vs_lblc.c @@ -136,7 +136,7 @@ static void ip_vs_lblc_rcu_free(struct rcu_head *head) struct ip_vs_lblc_entry, rcu_head); - ip_vs_dest_put(en->dest); + ip_vs_dest_put_and_free(en->dest); kfree(en); } diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c index 0b85500..3f21a2f 100644 --- a/net/netfilter/ipvs/ip_vs_lblcr.c +++ b/net/netfilter/ipvs/ip_vs_lblcr.c @@ -130,7 +130,7 @@ static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head) struct ip_vs_dest_set_elem *e; e = container_of(head, struct ip_vs_dest_set_elem, rcu_head); - ip_vs_dest_put(e->dest); + ip_vs_dest_put_and_free(e->dest); kfree(e); } -- 1.8.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman @ 2013-10-16 10:43 ` Pablo Neira Ayuso 2013-10-16 19:52 ` Julian Anastasov 0 siblings, 1 reply; 9+ messages in thread From: Pablo Neira Ayuso @ 2013-10-16 10:43 UTC (permalink / raw) To: Simon Horman Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov On Tue, Oct 15, 2013 at 11:01:46AM +0900, Simon Horman wrote: > From: Julian Anastasov <ja@ssi.bg> > > commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added > rcu_barrier() on cleanup to wait dest users and schedulers > like LBLC and LBLCR to put their last dest reference. > Using rcu_barrier with many namespaces is problematic. > > Trying to fix it by freeing dest with kfree_rcu is not > a solution, RCU callbacks can run in parallel and execution > order is random. > > Fix it by creating new function ip_vs_dest_put_and_free() > which is heavier than ip_vs_dest_put(). We will use it just > for schedulers like LBLC, LBLCR that can delay their dest > release. > > By default, dests reference is above 0 if they are present in > service and it is 0 when deleted but still in trash list. > Change the dest trash code to use ip_vs_dest_put_and_free(), > so that refcnt -1 can be used for freeing. As result, > such checks remain in slow path and the rcu_barrier() from > netns cleanup can be removed. I can enqueue this fix to nf if you like. No need to resend, I can manually apply. Let me know. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-16 10:43 ` Pablo Neira Ayuso @ 2013-10-16 19:52 ` Julian Anastasov 2013-10-17 0:49 ` Simon Horman 0 siblings, 1 reply; 9+ messages in thread From: Julian Anastasov @ 2013-10-16 19:52 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Simon Horman, lvs-devel, netdev, netfilter-devel, Wensong Zhang Hello, On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > I can enqueue this fix to nf if you like. No need to resend, I can > manually apply. > > Let me know. It is not critical. I waited weeks the net tree to be copied into net-next because it collides with the recent "ipvs: make the service replacement more robust" change in net tree :) But if a rcu_barrier in the netns cleanup looks scary enough you can push it to nf. IMHO, it just adds unneeded delay there. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-16 19:52 ` Julian Anastasov @ 2013-10-17 0:49 ` Simon Horman 2013-10-17 8:11 ` Pablo Neira Ayuso 0 siblings, 1 reply; 9+ messages in thread From: Simon Horman @ 2013-10-17 0:49 UTC (permalink / raw) To: Julian Anastasov Cc: Pablo Neira Ayuso, lvs-devel, netdev, netfilter-devel, Wensong Zhang On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > Hello, > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > I can enqueue this fix to nf if you like. No need to resend, I can > > manually apply. > > > > Let me know. > > It is not critical. I waited weeks the net tree to be > copied into net-next because it collides with the recent > "ipvs: make the service replacement more robust" change in > net tree :) But if a rcu_barrier in the netns cleanup looks > scary enough you can push it to nf. IMHO, it just adds > unneeded delay there. If it is not critical I would prefer for it to travel through nf-next. Though I do not feel strongly about this. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-17 0:49 ` Simon Horman @ 2013-10-17 8:11 ` Pablo Neira Ayuso 2013-10-17 8:30 ` Simon Horman 0 siblings, 1 reply; 9+ messages in thread From: Pablo Neira Ayuso @ 2013-10-17 8:11 UTC (permalink / raw) To: Simon Horman Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel, Wensong Zhang On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote: > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > > > Hello, > > > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > > > I can enqueue this fix to nf if you like. No need to resend, I can > > > manually apply. > > > > > > Let me know. > > > > It is not critical. I waited weeks the net tree to be > > copied into net-next because it collides with the recent > > "ipvs: make the service replacement more robust" change in > > net tree :) But if a rcu_barrier in the netns cleanup looks > > scary enough you can push it to nf. IMHO, it just adds > > unneeded delay there. > > If it is not critical I would prefer for it to travel through > nf-next. Though I do not feel strongly about this. Will enqueue for nf-next. I'd appreciate if you can recover the tradition of attaching a short evaluation in the cover letter as I do when I send pull requests to David. Thanks! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-17 8:11 ` Pablo Neira Ayuso @ 2013-10-17 8:30 ` Simon Horman 0 siblings, 0 replies; 9+ messages in thread From: Simon Horman @ 2013-10-17 8:30 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel, Wensong Zhang On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote: > On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote: > > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > > > > > Hello, > > > > > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > > > > > I can enqueue this fix to nf if you like. No need to resend, I can > > > > manually apply. > > > > > > > > Let me know. > > > > > > It is not critical. I waited weeks the net tree to be > > > copied into net-next because it collides with the recent > > > "ipvs: make the service replacement more robust" change in > > > net tree :) But if a rcu_barrier in the netns cleanup looks > > > scary enough you can push it to nf. IMHO, it just adds > > > unneeded delay there. > > > > If it is not critical I would prefer for it to travel through > > nf-next. Though I do not feel strongly about this. > > Will enqueue for nf-next. > > I'd appreciate if you can recover the tradition of attaching a short > evaluation in the cover letter as I do when I send pull requests to > David. Thanks! Sure, will do. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] ipvs: improved SH fallback strategy 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman 2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman @ 2013-10-15 2:01 ` Simon Horman 2 siblings, 0 replies; 9+ messages in thread From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov, Alexander Frolkin, Simon Horman From: Alexander Frolkin <avf@eldamar.org.uk> Improve the SH fallback realserver selection strategy. With sh and sh-fallback, if a realserver is down, this attempts to distribute the traffic that would have gone to that server evenly among the remaining servers. Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> --- net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c index 3588fae..cc65b2f 100644 --- a/net/netfilter/ipvs/ip_vs_sh.c +++ b/net/netfilter/ipvs/ip_vs_sh.c @@ -115,27 +115,46 @@ ip_vs_sh_get(struct ip_vs_service *svc, struct ip_vs_sh_state *s, } -/* As ip_vs_sh_get, but with fallback if selected server is unavailable */ +/* As ip_vs_sh_get, but with fallback if selected server is unavailable + * + * The fallback strategy loops around the table starting from a "random" + * point (in fact, it is chosen to be the original hash value to make the + * algorithm deterministic) to find a new server. + */ static inline struct ip_vs_dest * ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s, const union nf_inet_addr *addr, __be16 port) { - unsigned int offset; - unsigned int hash; + unsigned int offset, roffset; + unsigned int hash, ihash; struct ip_vs_dest *dest; + /* first try the dest it's supposed to go to */ + ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0); + dest = rcu_dereference(s->buckets[ihash].dest); + if (!dest) + return NULL; + if (!is_unavailable(dest)) + return dest; + + IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting", + IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port)); + + /* if the original dest is unavailable, loop around the table + * starting from ihash to find a new dest + */ for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) { - hash = ip_vs_sh_hashkey(svc->af, addr, port, offset); + roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE; + hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset); dest = rcu_dereference(s->buckets[hash].dest); if (!dest) break; - if (is_unavailable(dest)) - IP_VS_DBG_BUF(6, "SH: selected unavailable server " - "%s:%d (offset %d)", - IP_VS_DBG_ADDR(svc->af, &dest->addr), - ntohs(dest->port), offset); - else + if (!is_unavailable(dest)) return dest; + IP_VS_DBG_BUF(6, "SH: selected unavailable " + "server %s:%d (offset %d), reselecting", + IP_VS_DBG_ADDR(svc->af, &dest->addr), + ntohs(dest->port), roffset); } return NULL; -- 1.8.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-17 8:30 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman 2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman 2013-10-16 10:43 ` Pablo Neira Ayuso 2013-10-16 19:52 ` Julian Anastasov 2013-10-17 0:49 ` Simon Horman 2013-10-17 8:11 ` Pablo Neira Ayuso 2013-10-17 8:30 ` Simon Horman 2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).