* [GIT PULL] IPVS updates for v3.13
@ 2013-10-15 2:01 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Simon Horman
Hi Pablo,
please consider the following fixes for IPVS for v3.13.
This pull request is based on nf-next.
The following changes since commit 58308451e91974267e1f4a618346055342019e02:
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next (2013-10-10 15:29:44 -0400)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v3.13
for you to fetch changes up to 1255ce5f10dbb4646c8d43b8d59faab48ae4a6b2:
ipvs: improved SH fallback strategy (2013-10-15 10:54:50 +0900)
----------------------------------------------------------------
IPVS updates for v3.13
* Improvements to SH fallback strategy
* Avoid rcu_barrier during netns cleanup
* Fix the IPVS_CMD_ATTR_MAX definition
----------------------------------------------------------------
Alexander Frolkin (1):
ipvs: improved SH fallback strategy
Julian Anastasov (2):
ipvs: fix the IPVS_CMD_ATTR_MAX definition
ipvs: avoid rcu_barrier during netns cleanup
include/net/ip_vs.h | 6 ++++++
include/uapi/linux/ip_vs.h | 2 +-
net/netfilter/ipvs/ip_vs_ctl.c | 6 +-----
net/netfilter/ipvs/ip_vs_lblc.c | 2 +-
net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++----------
6 files changed, 39 insertions(+), 18 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
@ 2013-10-15 2:01 ` Simon Horman
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
2 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Simon Horman
From: Julian Anastasov <ja@ssi.bg>
It was wrong (bigger) but problem is harmless.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/uapi/linux/ip_vs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
index 2945822..fbcffe8 100644
--- a/include/uapi/linux/ip_vs.h
+++ b/include/uapi/linux/ip_vs.h
@@ -334,7 +334,7 @@ enum {
__IPVS_CMD_ATTR_MAX,
};
-#define IPVS_CMD_ATTR_MAX (__IPVS_SVC_ATTR_MAX - 1)
+#define IPVS_CMD_ATTR_MAX (__IPVS_CMD_ATTR_MAX - 1)
/*
* Attributes used to describe a service
--
1.8.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
@ 2013-10-15 2:01 ` Simon Horman
2013-10-16 10:43 ` Pablo Neira Ayuso
2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
2 siblings, 1 reply; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Simon Horman
From: Julian Anastasov <ja@ssi.bg>
commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
rcu_barrier() on cleanup to wait dest users and schedulers
like LBLC and LBLCR to put their last dest reference.
Using rcu_barrier with many namespaces is problematic.
Trying to fix it by freeing dest with kfree_rcu is not
a solution, RCU callbacks can run in parallel and execution
order is random.
Fix it by creating new function ip_vs_dest_put_and_free()
which is heavier than ip_vs_dest_put(). We will use it just
for schedulers like LBLC, LBLCR that can delay their dest
release.
By default, dests reference is above 0 if they are present in
service and it is 0 when deleted but still in trash list.
Change the dest trash code to use ip_vs_dest_put_and_free(),
so that refcnt -1 can be used for freeing. As result,
such checks remain in slow path and the rcu_barrier() from
netns cleanup can be removed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 6 ++++++
net/netfilter/ipvs/ip_vs_ctl.c | 6 +-----
net/netfilter/ipvs/ip_vs_lblc.c | 2 +-
net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
4 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..cd7275f 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1442,6 +1442,12 @@ static inline void ip_vs_dest_put(struct ip_vs_dest *dest)
atomic_dec(&dest->refcnt);
}
+static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest)
+{
+ if (atomic_dec_return(&dest->refcnt) < 0)
+ kfree(dest);
+}
+
/*
* IPVS sync daemon data and function prototypes
* (from ip_vs_sync.c)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index a3df9bd..62786a4 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -704,7 +704,7 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest)
__ip_vs_dst_cache_reset(dest);
__ip_vs_svc_put(svc, false);
free_percpu(dest->stats.cpustats);
- kfree(dest);
+ ip_vs_dest_put_and_free(dest);
}
/*
@@ -3820,10 +3820,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
- /* Some dest can be in grace period even before cleanup, we have to
- * defer ip_vs_trash_cleanup until ip_vs_dest_wait_readers is called.
- */
- rcu_barrier();
ip_vs_trash_cleanup(net);
ip_vs_stop_estimator(net, &ipvs->tot_stats);
ip_vs_control_net_cleanup_sysctl(net);
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index eff13c9..ca056a3 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -136,7 +136,7 @@ static void ip_vs_lblc_rcu_free(struct rcu_head *head)
struct ip_vs_lblc_entry,
rcu_head);
- ip_vs_dest_put(en->dest);
+ ip_vs_dest_put_and_free(en->dest);
kfree(en);
}
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 0b85500..3f21a2f 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -130,7 +130,7 @@ static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head)
struct ip_vs_dest_set_elem *e;
e = container_of(head, struct ip_vs_dest_set_elem, rcu_head);
- ip_vs_dest_put(e->dest);
+ ip_vs_dest_put_and_free(e->dest);
kfree(e);
}
--
1.8.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] ipvs: improved SH fallback strategy
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
@ 2013-10-15 2:01 ` Simon Horman
2 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-15 2:01 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov, Alexander Frolkin, Simon Horman
From: Alexander Frolkin <avf@eldamar.org.uk>
Improve the SH fallback realserver selection strategy.
With sh and sh-fallback, if a realserver is down, this attempts to
distribute the traffic that would have gone to that server evenly
among the remaining servers.
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
net/netfilter/ipvs/ip_vs_sh.c | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 3588fae..cc65b2f 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -115,27 +115,46 @@ ip_vs_sh_get(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
}
-/* As ip_vs_sh_get, but with fallback if selected server is unavailable */
+/* As ip_vs_sh_get, but with fallback if selected server is unavailable
+ *
+ * The fallback strategy loops around the table starting from a "random"
+ * point (in fact, it is chosen to be the original hash value to make the
+ * algorithm deterministic) to find a new server.
+ */
static inline struct ip_vs_dest *
ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
const union nf_inet_addr *addr, __be16 port)
{
- unsigned int offset;
- unsigned int hash;
+ unsigned int offset, roffset;
+ unsigned int hash, ihash;
struct ip_vs_dest *dest;
+ /* first try the dest it's supposed to go to */
+ ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0);
+ dest = rcu_dereference(s->buckets[ihash].dest);
+ if (!dest)
+ return NULL;
+ if (!is_unavailable(dest))
+ return dest;
+
+ IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting",
+ IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port));
+
+ /* if the original dest is unavailable, loop around the table
+ * starting from ihash to find a new dest
+ */
for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
- hash = ip_vs_sh_hashkey(svc->af, addr, port, offset);
+ roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE;
+ hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset);
dest = rcu_dereference(s->buckets[hash].dest);
if (!dest)
break;
- if (is_unavailable(dest))
- IP_VS_DBG_BUF(6, "SH: selected unavailable server "
- "%s:%d (offset %d)",
- IP_VS_DBG_ADDR(svc->af, &dest->addr),
- ntohs(dest->port), offset);
- else
+ if (!is_unavailable(dest))
return dest;
+ IP_VS_DBG_BUF(6, "SH: selected unavailable "
+ "server %s:%d (offset %d), reselecting",
+ IP_VS_DBG_ADDR(svc->af, &dest->addr),
+ ntohs(dest->port), roffset);
}
return NULL;
--
1.8.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
@ 2013-10-16 10:43 ` Pablo Neira Ayuso
2013-10-16 19:52 ` Julian Anastasov
0 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2013-10-16 10:43 UTC (permalink / raw)
To: Simon Horman
Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
Julian Anastasov
On Tue, Oct 15, 2013 at 11:01:46AM +0900, Simon Horman wrote:
> From: Julian Anastasov <ja@ssi.bg>
>
> commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
> rcu_barrier() on cleanup to wait dest users and schedulers
> like LBLC and LBLCR to put their last dest reference.
> Using rcu_barrier with many namespaces is problematic.
>
> Trying to fix it by freeing dest with kfree_rcu is not
> a solution, RCU callbacks can run in parallel and execution
> order is random.
>
> Fix it by creating new function ip_vs_dest_put_and_free()
> which is heavier than ip_vs_dest_put(). We will use it just
> for schedulers like LBLC, LBLCR that can delay their dest
> release.
>
> By default, dests reference is above 0 if they are present in
> service and it is 0 when deleted but still in trash list.
> Change the dest trash code to use ip_vs_dest_put_and_free(),
> so that refcnt -1 can be used for freeing. As result,
> such checks remain in slow path and the rcu_barrier() from
> netns cleanup can be removed.
I can enqueue this fix to nf if you like. No need to resend, I can
manually apply.
Let me know.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-16 10:43 ` Pablo Neira Ayuso
@ 2013-10-16 19:52 ` Julian Anastasov
2013-10-17 0:49 ` Simon Horman
0 siblings, 1 reply; 9+ messages in thread
From: Julian Anastasov @ 2013-10-16 19:52 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Simon Horman, lvs-devel, netdev, netfilter-devel, Wensong Zhang
Hello,
On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> I can enqueue this fix to nf if you like. No need to resend, I can
> manually apply.
>
> Let me know.
It is not critical. I waited weeks the net tree to be
copied into net-next because it collides with the recent
"ipvs: make the service replacement more robust" change in
net tree :) But if a rcu_barrier in the netns cleanup looks
scary enough you can push it to nf. IMHO, it just adds
unneeded delay there.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-16 19:52 ` Julian Anastasov
@ 2013-10-17 0:49 ` Simon Horman
2013-10-17 8:11 ` Pablo Neira Ayuso
0 siblings, 1 reply; 9+ messages in thread
From: Simon Horman @ 2013-10-17 0:49 UTC (permalink / raw)
To: Julian Anastasov
Cc: Pablo Neira Ayuso, lvs-devel, netdev, netfilter-devel,
Wensong Zhang
On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
>
> > I can enqueue this fix to nf if you like. No need to resend, I can
> > manually apply.
> >
> > Let me know.
>
> It is not critical. I waited weeks the net tree to be
> copied into net-next because it collides with the recent
> "ipvs: make the service replacement more robust" change in
> net tree :) But if a rcu_barrier in the netns cleanup looks
> scary enough you can push it to nf. IMHO, it just adds
> unneeded delay there.
If it is not critical I would prefer for it to travel through
nf-next. Though I do not feel strongly about this.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-17 0:49 ` Simon Horman
@ 2013-10-17 8:11 ` Pablo Neira Ayuso
2013-10-17 8:30 ` Simon Horman
0 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2013-10-17 8:11 UTC (permalink / raw)
To: Simon Horman
Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel,
Wensong Zhang
On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> >
> > Hello,
> >
> > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> >
> > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > manually apply.
> > >
> > > Let me know.
> >
> > It is not critical. I waited weeks the net tree to be
> > copied into net-next because it collides with the recent
> > "ipvs: make the service replacement more robust" change in
> > net tree :) But if a rcu_barrier in the netns cleanup looks
> > scary enough you can push it to nf. IMHO, it just adds
> > unneeded delay there.
>
> If it is not critical I would prefer for it to travel through
> nf-next. Though I do not feel strongly about this.
Will enqueue for nf-next.
I'd appreciate if you can recover the tradition of attaching a short
evaluation in the cover letter as I do when I send pull requests to
David. Thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
2013-10-17 8:11 ` Pablo Neira Ayuso
@ 2013-10-17 8:30 ` Simon Horman
0 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2013-10-17 8:30 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel,
Wensong Zhang
On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> > >
> > > Hello,
> > >
> > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> > >
> > > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > > manually apply.
> > > >
> > > > Let me know.
> > >
> > > It is not critical. I waited weeks the net tree to be
> > > copied into net-next because it collides with the recent
> > > "ipvs: make the service replacement more robust" change in
> > > net tree :) But if a rcu_barrier in the netns cleanup looks
> > > scary enough you can push it to nf. IMHO, it just adds
> > > unneeded delay there.
> >
> > If it is not critical I would prefer for it to travel through
> > nf-next. Though I do not feel strongly about this.
>
> Will enqueue for nf-next.
>
> I'd appreciate if you can recover the tradition of attaching a short
> evaluation in the cover letter as I do when I send pull requests to
> David. Thanks!
Sure, will do.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-17 8:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-15 2:01 [GIT PULL] IPVS updates for v3.13 Simon Horman
2013-10-15 2:01 ` [PATCH 1/3] ipvs: fix the IPVS_CMD_ATTR_MAX definition Simon Horman
2013-10-15 2:01 ` [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup Simon Horman
2013-10-16 10:43 ` Pablo Neira Ayuso
2013-10-16 19:52 ` Julian Anastasov
2013-10-17 0:49 ` Simon Horman
2013-10-17 8:11 ` Pablo Neira Ayuso
2013-10-17 8:30 ` Simon Horman
2013-10-15 2:01 ` [PATCH 3/3] ipvs: improved SH fallback strategy Simon Horman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.