* [GIT PULL] IPVS updates for v3.13
@ 2013-10-15 2:01 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Simon Horman
Hi Pablo,
please consider the following fixes for IPVS for v3.13.
This pull request is based on nf-next.
The following changes since commit 58308451e91974267e1f4a618346055342019e02:
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next (2013-10-10 15:29:44 -0400)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v3.13
for you to fetch changes up to 1255ce5f10dbb4646c8d43b8d59faab48ae4a6b2:
ipvs: improved SH fallback strategy (2013-10-15 10:54:50 +0900)
----------------------------------------------------------------
IPVS updates for v3.13
* Improvements to SH fallback strategy
* Avoid rcu_barrier during netns cleanup
* Fix the IPVS_CMD_ATTR_MAX definition
----------------------------------------------------------------
Alexander Frolkin (1):
ipvs: improved SH fallback strategy
Julian Anastasov (2):
ipvs: fix the IPVS_CMD_ATTR_MAX definition
ipvs: avoid rcu_barrier during netns cleanup
include/net/ip_vs.h | 6 ++++++
include/uapi/linux/ip_vs.h | 2 +-
net/netfilter/ipvs/ip_vs_ctl.c | 6 +-----
net/netfilter/ipvs/ip_vs_lblc.c | 2 +-
net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++----------
6 files changed, 39 insertions(+), 18 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman @ 2013-10-15 2:01 ` Simon Horman 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman 2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman 2 siblings, 0 replies; 9+ messages in thread From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov, Simon Horman From: Julian Anastasov <ja@ssi.bg> It was wrong (bigger) but problem is harmless. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> --- include/uapi/linux/ip_vs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h index 2945822..fbcffe8 100644 --- a/include/uapi/linux/ip_vs.h +++ b/include/uapi/linux/ip_vs.h @@ -334,7 +334,7 @@ enum { __IPVS_CMD_ATTR_MAX, }; -#define IPVS_CMD_ATTR_MAX (__IPVS_SVC_ATTR_MAX - 1) +#define IPVS_CMD_ATTR_MAX (__IPVS_CMD_ATTR_MAX - 1) /* * Attributes used to describe a service -- 1.8.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman 2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman @ 2013-10-15 2:01 ` Simon Horman 2013-10-16 10:43 ` Pablo Neira Ayuso 2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman 2 siblings, 1 reply; 9+ messages in thread From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov, Simon Horman From: Julian Anastasov <ja@ssi.bg> commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added rcu_barrier() on cleanup to wait dest users and schedulers like LBLC and LBLCR to put their last dest reference. Using rcu_barrier with many namespaces is problematic. Trying to fix it by freeing dest with kfree_rcu is not a solution, RCU callbacks can run in parallel and execution order is random. Fix it by creating new function ip_vs_dest_put_and_free() which is heavier than ip_vs_dest_put(). We will use it just for schedulers like LBLC, LBLCR that can delay their dest release. By default, dests reference is above 0 if they are present in service and it is 0 when deleted but still in trash list. Change the dest trash code to use ip_vs_dest_put_and_free(), so that refcnt -1 can be used for freeing. As result, such checks remain in slow path and the rcu_barrier() from netns cleanup can be removed. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> --- include/net/ip_vs.h | 6 ++++++ net/netfilter/ipvs/ip_vs_ctl.c | 6 +----- net/netfilter/ipvs/ip_vs_lblc.c | 2 +- net/netfilter/ipvs/ip_vs_lblcr.c | 2 +- 4 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 1c2e1b9..cd7275f 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -1442,6 +1442,12 @@ static inline void ip_vs_dest_put(struct ip_vs_dest *dest) atomic_dec(&dest->refcnt); } +static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest) +{ + if (atomic_dec_return(&dest->refcnt) < 0) + kfree(dest); +} + /* * IPVS sync daemon data and function prototypes * (from ip_vs_sync.c) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index a3df9bd..62786a4 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -704,7 +704,7 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest) __ip_vs_dst_cache_reset(dest); __ip_vs_svc_put(svc, false); free_percpu(dest->stats.cpustats); - kfree(dest); + ip_vs_dest_put_and_free(dest); } /* @@ -3820,10 +3820,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net) { struct netns_ipvs *ipvs = net_ipvs(net); - /* Some dest can be in grace period even before cleanup, we have to - * defer ip_vs_trash_cleanup until ip_vs_dest_wait_readers is called. - */ - rcu_barrier(); ip_vs_trash_cleanup(net); ip_vs_stop_estimator(net, &ipvs->tot_stats); ip_vs_control_net_cleanup_sysctl(net); diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c index eff13c9..ca056a3 100644 --- a/net/netfilter/ipvs/ip_vs_lblc.c +++ b/net/netfilter/ipvs/ip_vs_lblc.c @@ -136,7 +136,7 @@ static void ip_vs_lblc_rcu_free(struct rcu_head *head) struct ip_vs_lblc_entry, rcu_head); - ip_vs_dest_put(en->dest); + ip_vs_dest_put_and_free(en->dest); kfree(en); } diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c index 0b85500..3f21a2f 100644 --- a/net/netfilter/ipvs/ip_vs_lblcr.c +++ b/net/netfilter/ipvs/ip_vs_lblcr.c @@ -130,7 +130,7 @@ static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head) struct ip_vs_dest_set_elem *e; e = container_of(head, struct ip_vs_dest_set_elem, rcu_head); - ip_vs_dest_put(e->dest); + ip_vs_dest_put_and_free(e->dest); kfree(e); } -- 1.8.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman @ 2013-10-16 10:43 ` Pablo Neira Ayuso 2013-10-16 19:52 ` Julian Anastasov 0 siblings, 1 reply; 9+ messages in thread From: Pablo Neira Ayuso @ 2013-10-16 10:43 UTC (permalink / raw) To: Simon Horman Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov On Tue, Oct 15, 2013 at 11:01:46AM +0900, Simon Horman wrote: > From: Julian Anastasov <ja@ssi.bg> > > commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added > rcu_barrier() on cleanup to wait dest users and schedulers > like LBLC and LBLCR to put their last dest reference. > Using rcu_barrier with many namespaces is problematic. > > Trying to fix it by freeing dest with kfree_rcu is not > a solution, RCU callbacks can run in parallel and execution > order is random. > > Fix it by creating new function ip_vs_dest_put_and_free() > which is heavier than ip_vs_dest_put(). We will use it just > for schedulers like LBLC, LBLCR that can delay their dest > release. > > By default, dests reference is above 0 if they are present in > service and it is 0 when deleted but still in trash list. > Change the dest trash code to use ip_vs_dest_put_and_free(), > so that refcnt -1 can be used for freeing. As result, > such checks remain in slow path and the rcu_barrier() from > netns cleanup can be removed. I can enqueue this fix to nf if you like. No need to resend, I can manually apply. Let me know. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-16 10:43 ` Pablo Neira Ayuso @ 2013-10-16 19:52 ` Julian Anastasov 2013-10-17 0:49 ` Simon Horman 0 siblings, 1 reply; 9+ messages in thread From: Julian Anastasov @ 2013-10-16 19:52 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Simon Horman, lvs-devel, netdev, netfilter-devel, Wensong Zhang Hello, On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > I can enqueue this fix to nf if you like. No need to resend, I can > manually apply. > > Let me know. It is not critical. I waited weeks the net tree to be copied into net-next because it collides with the recent "ipvs: make the service replacement more robust" change in net tree :) But if a rcu_barrier in the netns cleanup looks scary enough you can push it to nf. IMHO, it just adds unneeded delay there. Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-16 19:52 ` Julian Anastasov @ 2013-10-17 0:49 ` Simon Horman 2013-10-17 8:11 ` Pablo Neira Ayuso 0 siblings, 1 reply; 9+ messages in thread From: Simon Horman @ 2013-10-17 0:49 UTC (permalink / raw) To: Julian Anastasov Cc: Pablo Neira Ayuso, lvs-devel, netdev, netfilter-devel, Wensong Zhang On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > Hello, > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > I can enqueue this fix to nf if you like. No need to resend, I can > > manually apply. > > > > Let me know. > > It is not critical. I waited weeks the net tree to be > copied into net-next because it collides with the recent > "ipvs: make the service replacement more robust" change in > net tree :) But if a rcu_barrier in the netns cleanup looks > scary enough you can push it to nf. IMHO, it just adds > unneeded delay there. If it is not critical I would prefer for it to travel through nf-next. Though I do not feel strongly about this. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-17 0:49 ` Simon Horman @ 2013-10-17 8:11 ` Pablo Neira Ayuso 2013-10-17 8:30 ` Simon Horman 0 siblings, 1 reply; 9+ messages in thread From: Pablo Neira Ayuso @ 2013-10-17 8:11 UTC (permalink / raw) To: Simon Horman Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel, Wensong Zhang On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote: > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > > > Hello, > > > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > > > I can enqueue this fix to nf if you like. No need to resend, I can > > > manually apply. > > > > > > Let me know. > > > > It is not critical. I waited weeks the net tree to be > > copied into net-next because it collides with the recent > > "ipvs: make the service replacement more robust" change in > > net tree :) But if a rcu_barrier in the netns cleanup looks > > scary enough you can push it to nf. IMHO, it just adds > > unneeded delay there. > > If it is not critical I would prefer for it to travel through > nf-next. Though I do not feel strongly about this. Will enqueue for nf-next. I'd appreciate if you can recover the tradition of attaching a short evaluation in the cover letter as I do when I send pull requests to David. Thanks! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup 2013-10-17 8:11 ` Pablo Neira Ayuso @ 2013-10-17 8:30 ` Simon Horman 0 siblings, 0 replies; 9+ messages in thread From: Simon Horman @ 2013-10-17 8:30 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel, Wensong Zhang On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote: > On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote: > > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > > > > > Hello, > > > > > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > > > > > I can enqueue this fix to nf if you like. No need to resend, I can > > > > manually apply. > > > > > > > > Let me know. > > > > > > It is not critical. I waited weeks the net tree to be > > > copied into net-next because it collides with the recent > > > "ipvs: make the service replacement more robust" change in > > > net tree :) But if a rcu_barrier in the netns cleanup looks > > > scary enough you can push it to nf. IMHO, it just adds > > > unneeded delay there. > > > > If it is not critical I would prefer for it to travel through > > nf-next. Though I do not feel strongly about this. > > Will enqueue for nf-next. > > I'd appreciate if you can recover the tradition of attaching a short > evaluation in the cover letter as I do when I send pull requests to > David. Thanks! Sure, will do. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] ipvs: improved SH fallback strategy 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman 2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman @ 2013-10-15 2:01 ` Simon Horman 2 siblings, 0 replies; 9+ messages in thread From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang, Julian Anastasov, Alexander Frolkin, Simon Horman From: Alexander Frolkin <avf@eldamar.org.uk> Improve the SH fallback realserver selection strategy. With sh and sh-fallback, if a realserver is down, this attempts to distribute the traffic that would have gone to that server evenly among the remaining servers. Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> --- net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c index 3588fae..cc65b2f 100644 --- a/net/netfilter/ipvs/ip_vs_sh.c +++ b/net/netfilter/ipvs/ip_vs_sh.c @@ -115,27 +115,46 @@ ip_vs_sh_get(struct ip_vs_service *svc, struct ip_vs_sh_state *s, } -/* As ip_vs_sh_get, but with fallback if selected server is unavailable */ +/* As ip_vs_sh_get, but with fallback if selected server is unavailable + * + * The fallback strategy loops around the table starting from a "random" + * point (in fact, it is chosen to be the original hash value to make the + * algorithm deterministic) to find a new server. + */ static inline struct ip_vs_dest * ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s, const union nf_inet_addr *addr, __be16 port) { - unsigned int offset; - unsigned int hash; + unsigned int offset, roffset; + unsigned int hash, ihash; struct ip_vs_dest *dest; + /* first try the dest it's supposed to go to */ + ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0); + dest = rcu_dereference(s->buckets[ihash].dest); + if (!dest) + return NULL; + if (!is_unavailable(dest)) + return dest; + + IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting", + IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port)); + + /* if the original dest is unavailable, loop around the table + * starting from ihash to find a new dest + */ for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) { - hash = ip_vs_sh_hashkey(svc->af, addr, port, offset); + roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE; + hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset); dest = rcu_dereference(s->buckets[hash].dest); if (!dest) break; - if (is_unavailable(dest)) - IP_VS_DBG_BUF(6, "SH: selected unavailable server " - "%s:%d (offset %d)", - IP_VS_DBG_ADDR(svc->af, &dest->addr), - ntohs(dest->port), offset); - else + if (!is_unavailable(dest)) return dest; + IP_VS_DBG_BUF(6, "SH: selected unavailable " + "server %s:%d (offset %d), reselecting", + IP_VS_DBG_ADDR(svc->af, &dest->addr), + ntohs(dest->port), roffset); } return NULL; -- 1.8.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-17 8:30 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman 2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman 2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman 2013-10-16 10:43 ` Pablo Neira Ayuso 2013-10-16 19:52 ` Julian Anastasov 2013-10-17 0:49 ` Simon Horman 2013-10-17 8:11 ` Pablo Neira Ayuso 2013-10-17 8:30 ` Simon Horman 2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.