* [GIT PULL] IPVS updates for v3.13
@ 2013-10-15 2:01 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Simon Horman
Hi Pablo,
please consider the following fixes for IPVS for v3.13.
This pull request is based on nf-next.
The following changes since commit 58308451e91974267e1f4a618346055342019e02:
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next (2013-10-10 15:29:44 -0400)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v3.13
for you to fetch changes up to 1255ce5f10dbb4646c8d43b8d59faab48ae4a6b2:
ipvs: improved SH fallback strategy (2013-10-15 10:54:50 +0900)
----------------------------------------------------------------
IPVS updates for v3.13
* Improvements to SH fallback strategy
* Avoid rcu_barrier during netns cleanup
* Fix the IPVS_CMD_ATTR_MAX definition
----------------------------------------------------------------
Alexander Frolkin (1):
ipvs: improved SH fallback strategy
Julian Anastasov (2):
ipvs: fix the IPVS_CMD_ATTR_MAX definition
ipvs: avoid rcu_barrier during netns cleanup
include/net/ip_vs.h | 6 ++++++
include/uapi/linux/ip_vs.h | 2 +-
net/netfilter/ipvs/ip_vs_ctl.c | 6 +-----
net/netfilter/ipvs/ip_vs_lblc.c | 2 +-
net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++----------
6 files changed, 39 insertions(+), 18 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
@ 2013-10-15 2:01 ` Simon Horman
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
2 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Simon Horman
From: Julian Anastasov <ja@ssi.bg>
It was wrong (bigger) but problem is harmless.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/uapi/linux/ip_vs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
index 2945822..fbcffe8 100644
--- a/include/uapi/linux/ip_vs.h
+++ b/include/uapi/linux/ip_vs.h
@@ -334,7 +334,7 @@ enum {
__IPVS_CMD_ATTR_MAX,
};
-#define IPVS_CMD_ATTR_MAX (__IPVS_SVC_ATTR_MAX - 1)
+#define IPVS_CMD_ATTR_MAX (__IPVS_CMD_ATTR_MAX - 1)
/*
* Attributes used to describe a service
--
1.8.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
@ 2013-10-15 2:01 ` Simon Horman
2013-10-16 10:43 ` Pablo Neira Ayuso
2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
2 siblings, 1 reply; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Simon Horman
From: Julian Anastasov <ja@ssi.bg>
commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
rcu_barrier() on cleanup to wait dest users and schedulers
like LBLC and LBLCR to put their last dest reference.
Using rcu_barrier with many namespaces is problematic.
Trying to fix it by freeing dest with kfree_rcu is not
a solution, RCU callbacks can run in parallel and execution
order is random.
Fix it by creating new function ip_vs_dest_put_and_free()
which is heavier than ip_vs_dest_put(). We will use it just
for schedulers like LBLC, LBLCR that can delay their dest
release.
By default, dests reference is above 0 if they are present in
service and it is 0 when deleted but still in trash list.
Change the dest trash code to use ip_vs_dest_put_and_free(),
so that refcnt -1 can be used for freeing. As result,
such checks remain in slow path and the rcu_barrier() from
netns cleanup can be removed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 6 ++++++
net/netfilter/ipvs/ip_vs_ctl.c | 6 +-----
net/netfilter/ipvs/ip_vs_lblc.c | 2 +-
net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
4 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..cd7275f 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1442,6 +1442,12 @@ static inline void ip_vs_dest_put(struct ip_vs_dest *dest)
atomic_dec(&dest->refcnt);
}
+static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest)
+{
+ if (atomic_dec_return(&dest->refcnt) < 0)
+ kfree(dest);
+}
+
/*
* IPVS sync daemon data and function prototypes
* (from ip_vs_sync.c)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index a3df9bd..62786a4 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -704,7 +704,7 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest)
__ip_vs_dst_cache_reset(dest);
__ip_vs_svc_put(svc, false);
free_percpu(dest->stats.cpustats);
- kfree(dest);
+ ip_vs_dest_put_and_free(dest);
}
/*
@@ -3820,10 +3820,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
- /* Some dest can be in grace period even before cleanup, we have to
- * defer ip_vs_trash_cleanup until ip_vs_dest_wait_readers is called.
- */
- rcu_barrier();
ip_vs_trash_cleanup(net);
ip_vs_stop_estimator(net, &ipvs->tot_stats);
ip_vs_control_net_cleanup_sysctl(net);
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index eff13c9..ca056a3 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -136,7 +136,7 @@ static void ip_vs_lblc_rcu_free(struct rcu_head *head)
struct ip_vs_lblc_entry,
rcu_head);
- ip_vs_dest_put(en->dest);
+ ip_vs_dest_put_and_free(en->dest);
kfree(en);
}
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 0b85500..3f21a2f 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -130,7 +130,7 @@ static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head)
struct ip_vs_dest_set_elem *e;
e = container_of(head, struct ip_vs_dest_set_elem, rcu_head);
- ip_vs_dest_put(e->dest);
+ ip_vs_dest_put_and_free(e->dest);
kfree(e);
}
--
1.8.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] ipvs: improved SH fallback strategy
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
@ 2013-10-15 2:01 ` Simon Horman
2 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Alexander Frolkin, Simon Horman
From: Alexander Frolkin <avf@eldamar.org.uk>
Improve the SH fallback realserver selection strategy.
With sh and sh-fallback, if a realserver is down, this attempts to
distribute the traffic that would have gone to that server evenly
among the remaining servers.
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 3588fae..cc65b2f 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -115,27 +115,46 @@ ip_vs_sh_get(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
}
-/* As ip_vs_sh_get, but with fallback if selected server is unavailable */
+/* As ip_vs_sh_get, but with fallback if selected server is unavailable
+ *
+ * The fallback strategy loops around the table starting from a "random"
+ * point (in fact, it is chosen to be the original hash value to make the
+ * algorithm deterministic) to find a new server.
+ */
static inline struct ip_vs_dest *
ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
const union nf_inet_addr *addr, __be16 port)
{
- unsigned int offset;
- unsigned int hash;
+ unsigned int offset, roffset;
+ unsigned int hash, ihash;
struct ip_vs_dest *dest;
+ /* first try the dest it's supposed to go to */
+ ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0);
+ dest = rcu_dereference(s->buckets[ihash].dest);
+ if (!dest)
+ return NULL;
+ if (!is_unavailable(dest))
+ return dest;
+
+ IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting",
+ IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port));
+
+ /* if the original dest is unavailable, loop around the table
+ * starting from ihash to find a new dest
+ */
for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
- hash = ip_vs_sh_hashkey(svc->af, addr, port, offset);
+ roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE;
+ hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset);
dest = rcu_dereference(s->buckets[hash].dest);
if (!dest)
break;
- if (is_unavailable(dest))
- IP_VS_DBG_BUF(6, "SH: selected unavailable server "
- "%s:%d (offset %d)",
- IP_VS_DBG_ADDR(svc->af, &dest->addr),
- ntohs(dest->port), offset);
- else
+ if (!is_unavailable(dest))
return dest;
+ IP_VS_DBG_BUF(6, "SH: selected unavailable "
+ "server %s:%d (offset %d), reselecting",
+ IP_VS_DBG_ADDR(svc->af, &dest->addr),
+ ntohs(dest->port), roffset);
}
return NULL;
--
1.8.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
@ 2013-10-16 10:43 ` Pablo Neira Ayuso
2013-10-16 19:52 ` Julian Anastasov
0 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2013-10-16 10:43 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov
On Tue, Oct 15, 2013 at 11:01:46AM +0900, Simon Horman wrote:
> From: Julian Anastasov <ja@ssi.bg>
>
> commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
> rcu_barrier() on cleanup to wait dest users and schedulers
> like LBLC and LBLCR to put their last dest reference.
> Using rcu_barrier with many namespaces is problematic.
>
> Trying to fix it by freeing dest with kfree_rcu is not
> a solution, RCU callbacks can run in parallel and execution
> order is random.
>
> Fix it by creating new function ip_vs_dest_put_and_free()
> which is heavier than ip_vs_dest_put(). We will use it just
> for schedulers like LBLC, LBLCR that can delay their dest
> release.
>
> By default, dests reference is above 0 if they are present in
> service and it is 0 when deleted but still in trash list.
> Change the dest trash code to use ip_vs_dest_put_and_free(),
> so that refcnt -1 can be used for freeing. As result,
> such checks remain in slow path and the rcu_barrier() from
> netns cleanup can be removed.
I can enqueue this fix to nf if you like. No need to resend, I can
manually apply.
Let me know.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-16 10:43 ` Pablo Neira Ayuso
@ 2013-10-16 19:52 ` Julian Anastasov
2013-10-17 0:49 ` Simon Horman
0 siblings, 1 reply; 9+ messages in thread
From: Julian Anastasov @ 2013-10-16 19:52 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Simon Horman, lvs-devel, netdev, netfilter-devel, Wensong Zhang
Hello,
On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> I can enqueue this fix to nf if you like. No need to resend, I can
> manually apply.
>
> Let me know.
It is not critical. I waited weeks the net tree to be
copied into net-next because it collides with the recent
"ipvs: make the service replacement more robust" change in
net tree :) But if a rcu_barrier in the netns cleanup looks
scary enough you can push it to nf. IMHO, it just adds
unneeded delay there.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-16 19:52 ` Julian Anastasov
@ 2013-10-17 0:49 ` Simon Horman
2013-10-17 8:11 ` Pablo Neira Ayuso
0 siblings, 1 reply; 9+ messages in thread
From: Simon Horman @ 2013-10-17 0:49 UTC (permalink / raw)
To: Julian Anastasov
Cc: Pablo Neira Ayuso, lvs-devel, netdev, netfilter-devel,
Wensong Zhang
On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
>
> > I can enqueue this fix to nf if you like. No need to resend, I can
> > manually apply.
> >
> > Let me know.
>
> It is not critical. I waited weeks the net tree to be
> copied into net-next because it collides with the recent
> "ipvs: make the service replacement more robust" change in
> net tree :) But if a rcu_barrier in the netns cleanup looks
> scary enough you can push it to nf. IMHO, it just adds
> unneeded delay there.
If it is not critical I would prefer for it to travel through
nf-next. Though I do not feel strongly about this.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-17 0:49 ` Simon Horman
@ 2013-10-17 8:11 ` Pablo Neira Ayuso
2013-10-17 8:30 ` Simon Horman
0 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2013-10-17 8:11 UTC (permalink / raw)
To: Simon Horman
Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel,
Wensong Zhang
On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> >
> > Hello,
> >
> > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> >
> > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > manually apply.
> > >
> > > Let me know.
> >
> > It is not critical. I waited weeks the net tree to be
> > copied into net-next because it collides with the recent
> > "ipvs: make the service replacement more robust" change in
> > net tree :) But if a rcu_barrier in the netns cleanup looks
> > scary enough you can push it to nf. IMHO, it just adds
> > unneeded delay there.
>
> If it is not critical I would prefer for it to travel through
> nf-next. Though I do not feel strongly about this.
Will enqueue for nf-next.
I'd appreciate if you can recover the tradition of attaching a short
evaluation in the cover letter as I do when I send pull requests to
David. Thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-17 8:11 ` Pablo Neira Ayuso
@ 2013-10-17 8:30 ` Simon Horman
0 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-17 8:30 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel,
Wensong Zhang
On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> > >
> > > Hello,
> > >
> > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> > >
> > > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > > manually apply.
> > > >
> > > > Let me know.
> > >
> > > It is not critical. I waited weeks the net tree to be
> > > copied into net-next because it collides with the recent
> > > "ipvs: make the service replacement more robust" change in
> > > net tree :) But if a rcu_barrier in the netns cleanup looks
> > > scary enough you can push it to nf. IMHO, it just adds
> > > unneeded delay there.
> >
> > If it is not critical I would prefer for it to travel through
> > nf-next. Though I do not feel strongly about this.
>
> Will enqueue for nf-next.
>
> I'd appreciate if you can recover the tradition of attaching a short
> evaluation in the cover letter as I do when I send pull requests to
> David. Thanks!
Sure, will do.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-17 8:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
2013-10-16 10:43 ` Pablo Neira Ayuso
2013-10-16 19:52 ` Julian Anastasov
2013-10-17 0:49 ` Simon Horman
2013-10-17 8:11 ` Pablo Neira Ayuso
2013-10-17 8:30 ` Simon Horman
2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).