All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values
@ 2015-02-24  5:07 Alex Gartrell
  2015-02-24  5:07 ` [RFC PATCH net-next 1/4] ipvs: move ip_vs_trash_put_dest into its own function Alex Gartrell
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Alex Gartrell @ 2015-02-24  5:07 UTC (permalink / raw)
  To: wensong, horms, ja; +Cc: lvs-devel, kernel-team, Alex Gartrell

We have a custom scheduler that is failing some atomic allocations
sometimes, and it was made pretty hard to discover by the fact that we more
or less drop the return code on the floor.  This is especially nasty, as
the ipvsadm -ln invocation shows the scheduler to be as we wish it were
instead of as it is.

This is an RFC because I have no idea what the best way to proceed with the
upd_dest invocation is.  I could clone all of the state and simply restore
it, but I figured one of you might have a better idea.

Alex Gartrell (4):
  ipvs: move ip_vs_trash_put_dest into its own function
  ipvs: drop dest_p from ip_vs_new_dest
  ipvs: check return value of add_dest
  ipvs: check return value of del_dest

 net/netfilter/ipvs/ip_vs_ctl.c | 106 +++++++++++++++++++++++++----------------
 1 file changed, 66 insertions(+), 40 deletions(-)

-- 
Alex Gartrell <agartrell@fb.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH net-next 1/4] ipvs: move ip_vs_trash_put_dest into its own function
  2015-02-24  5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
@ 2015-02-24  5:07 ` Alex Gartrell
  2015-02-24  5:07 ` [RFC PATCH net-next 2/4] ipvs: drop dest_p from ip_vs_new_dest Alex Gartrell
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Alex Gartrell @ 2015-02-24  5:07 UTC (permalink / raw)
  To: wensong, horms, ja; +Cc: lvs-devel, kernel-team, Alex Gartrell

This corresponds better with ip_vs_trash_get_dest and we'll need to all it
directly if the scheduler add dest fails.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index e557590..11da053 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -696,6 +696,24 @@ out:
 	return dest;
 }
 
+static inline void ip_vs_trash_put_dest(struct netns_ipvs *ipvs,
+					struct ip_vs_dest *dest,
+					bool cleanup)
+{
+	spin_lock_bh(&ipvs->dest_trash_lock);
+	IP_VS_DBG_BUF(3, "Moving dest %s:%u into trash, dest->refcnt=%d\n",
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port),
+		      atomic_read(&dest->refcnt));
+	if (list_empty(&ipvs->dest_trash) && !cleanup)
+		mod_timer(&ipvs->dest_trash_timer,
+			  jiffies + (IP_VS_DEST_TRASH_PERIOD >> 1));
+	/* dest lives in trash without reference */
+	list_add(&dest->t_list, &ipvs->dest_trash);
+	dest->idle_start = 0;
+	spin_unlock_bh(&ipvs->dest_trash_lock);
+	ip_vs_dest_put(dest);
+}
+
 static void ip_vs_dest_free(struct ip_vs_dest *dest)
 {
 	struct ip_vs_service *svc = rcu_dereference_protected(dest->svc, 1);
@@ -1032,18 +1050,7 @@ static void __ip_vs_del_dest(struct net *net, struct ip_vs_dest *dest,
 	 */
 	ip_vs_rs_unhash(dest);
 
-	spin_lock_bh(&ipvs->dest_trash_lock);
-	IP_VS_DBG_BUF(3, "Moving dest %s:%u into trash, dest->refcnt=%d\n",
-		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port),
-		      atomic_read(&dest->refcnt));
-	if (list_empty(&ipvs->dest_trash) && !cleanup)
-		mod_timer(&ipvs->dest_trash_timer,
-			  jiffies + (IP_VS_DEST_TRASH_PERIOD >> 1));
-	/* dest lives in trash without reference */
-	list_add(&dest->t_list, &ipvs->dest_trash);
-	dest->idle_start = 0;
-	spin_unlock_bh(&ipvs->dest_trash_lock);
-	ip_vs_dest_put(dest);
+	ip_vs_trash_put_dest(ipvs, dest, cleanup);
 }
 
 
-- 
Alex Gartrell <agartrell@fb.com>


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH net-next 2/4] ipvs: drop dest_p from ip_vs_new_dest
  2015-02-24  5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
  2015-02-24  5:07 ` [RFC PATCH net-next 1/4] ipvs: move ip_vs_trash_put_dest into its own function Alex Gartrell
@ 2015-02-24  5:07 ` Alex Gartrell
  2015-02-24  5:07 ` [RFC PATCH net-next 3/4] ipvs: check return value of add_dest Alex Gartrell
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Alex Gartrell @ 2015-02-24  5:07 UTC (permalink / raw)
  To: wensong, horms, ja; +Cc: lvs-devel, kernel-team, Alex Gartrell

We don't actually use this in any way, so just drop it as an argument.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 11da053..6341e3f 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -863,8 +863,7 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
  *	Create a destination for the given service
  */
 static int
-ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
-	       struct ip_vs_dest **dest_p)
+ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 {
 	struct ip_vs_dest *dest;
 	unsigned int atype, i;
@@ -918,8 +917,6 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
 	spin_lock_init(&dest->stats.lock);
 	__ip_vs_update_dest(svc, dest, udest, 1);
 
-	*dest_p = dest;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH net-next 3/4] ipvs: check return value of add_dest
  2015-02-24  5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
  2015-02-24  5:07 ` [RFC PATCH net-next 1/4] ipvs: move ip_vs_trash_put_dest into its own function Alex Gartrell
  2015-02-24  5:07 ` [RFC PATCH net-next 2/4] ipvs: drop dest_p from ip_vs_new_dest Alex Gartrell
@ 2015-02-24  5:07 ` Alex Gartrell
  2015-02-24  5:07 ` [RFC PATCH net-next 4/4] ipvs: check return value of del_dest Alex Gartrell
  2015-02-24  7:30 ` [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Julian Anastasov
  4 siblings, 0 replies; 8+ messages in thread
From: Alex Gartrell @ 2015-02-24  5:07 UTC (permalink / raw)
  To: wensong, horms, ja; +Cc: lvs-devel, kernel-team, Alex Gartrell

If the scheduler's add_dest function fails, free the newly allocated or
reclaimed dest and then return the error code to the user.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 6341e3f..d384b7f 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -787,7 +787,7 @@ ip_vs_zero_stats(struct ip_vs_stats *stats)
 /*
  *	Update a destination in the given service
  */
-static void
+static int
 __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
 		    struct ip_vs_dest_user_kern *udest, int add)
 {
@@ -795,6 +795,7 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
 	struct ip_vs_service *old_svc;
 	struct ip_vs_scheduler *sched;
 	int conn_flags;
+	int ret = 0;
 
 	/* We cannot modify an address and change the address family */
 	BUG_ON(!add && udest->af != dest->af);
@@ -850,12 +851,20 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
 		ip_vs_start_estimator(svc->net, &dest->stats);
 		list_add_rcu(&dest->n_list, &svc->destinations);
 		svc->num_dests++;
-		if (sched->add_dest)
-			sched->add_dest(svc, dest);
+		if (sched->add_dest) {
+			ret = sched->add_dest(svc, dest);
+			if (ret) {
+				ip_vs_stop_estimator(svc->net, &dest->stats);
+				list_del_rcu(&dest->n_list);
+				svc->num_dests--;
+			}
+		}
 	} else {
 		if (sched->upd_dest)
 			sched->upd_dest(svc, dest);
 	}
+
+	return ret;
 }
 
 
@@ -867,6 +876,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 {
 	struct ip_vs_dest *dest;
 	unsigned int atype, i;
+	int ret;
 
 	EnterFunction(2);
 
@@ -915,10 +925,13 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 	INIT_HLIST_NODE(&dest->d_list);
 	spin_lock_init(&dest->dst_lock);
 	spin_lock_init(&dest->stats.lock);
-	__ip_vs_update_dest(svc, dest, udest, 1);
+
+	ret = __ip_vs_update_dest(svc, dest, udest, 1);
+	if (ret)
+		kfree(dest);
 
 	LeaveFunction(2);
-	return 0;
+	return ret;
 
 err_alloc:
 	kfree(dest);
@@ -977,8 +990,9 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 			      IP_VS_DBG_ADDR(svc->af, &dest->vaddr),
 			      ntohs(dest->vport));
 
-		__ip_vs_update_dest(svc, dest, udest, 1);
-		ret = 0;
+		ret = __ip_vs_update_dest(svc, dest, udest, 1);
+		if (ret)
+			ip_vs_trash_put_dest(net_ipvs(svc->net), dest, false);
 	} else {
 		/*
 		 * Allocate and initialize the dest structure
-- 
Alex Gartrell <agartrell@fb.com>


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH net-next 4/4] ipvs: check return value of del_dest
  2015-02-24  5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
                   ` (2 preceding siblings ...)
  2015-02-24  5:07 ` [RFC PATCH net-next 3/4] ipvs: check return value of add_dest Alex Gartrell
@ 2015-02-24  5:07 ` Alex Gartrell
  2015-02-24  7:30 ` [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Julian Anastasov
  4 siblings, 0 replies; 8+ messages in thread
From: Alex Gartrell @ 2015-02-24  5:07 UTC (permalink / raw)
  To: wensong, horms, ja; +Cc: lvs-devel, kernel-team, Alex Gartrell

If del_dest returns non-zero, restore flags and the counts in services and
ipvs and leave things as they were.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index d384b7f..2e9deb4 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1068,10 +1068,13 @@ static void __ip_vs_del_dest(struct net *net, struct ip_vs_dest *dest,
 /*
  *	Unlink a destination from the given service
  */
-static void __ip_vs_unlink_dest(struct ip_vs_service *svc,
-				struct ip_vs_dest *dest,
-				int svcupd)
+static int __ip_vs_unlink_dest(struct ip_vs_service *svc,
+			       struct ip_vs_dest *dest,
+			       int svcupd)
 {
+	int ret = 0;
+	int old_flags = dest->flags;
+
 	dest->flags &= ~IP_VS_DEST_F_AVAILABLE;
 
 	/*
@@ -1080,16 +1083,25 @@ static void __ip_vs_unlink_dest(struct ip_vs_service *svc,
 	list_del_rcu(&dest->n_list);
 	svc->num_dests--;
 
-	if (dest->af != svc->af)
-		net_ipvs(svc->net)->mixed_address_family_dests--;
 
 	if (svcupd) {
 		struct ip_vs_scheduler *sched;
 
 		sched = rcu_dereference_protected(svc->scheduler, 1);
-		if (sched->del_dest)
-			sched->del_dest(svc, dest);
+		if (sched->del_dest) {
+			ret = sched->del_dest(svc, dest);
+			if (ret) {
+				list_add_rcu(&dest->n_list, &svc->destinations);
+				dest->flags = old_flags;
+				svc->num_dests++;
+			}
+		}
 	}
+
+	if (!ret && dest->af != svc->af)
+		net_ipvs(svc->net)->mixed_address_family_dests--;
+
+	return ret;
 }
 
 
@@ -1101,6 +1113,7 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 {
 	struct ip_vs_dest *dest;
 	__be16 dport = udest->port;
+	int ret;
 
 	EnterFunction(2);
 
@@ -1114,19 +1127,14 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 		return -ENOENT;
 	}
 
-	/*
-	 *	Unlink dest from the service
-	 */
-	__ip_vs_unlink_dest(svc, dest, 1);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values
  2015-02-24  5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
                   ` (3 preceding siblings ...)
  2015-02-24  5:07 ` [RFC PATCH net-next 4/4] ipvs: check return value of del_dest Alex Gartrell
@ 2015-02-24  7:30 ` Julian Anastasov
  2015-02-24  7:51   ` Alex Gartrell
  4 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2015-02-24  7:30 UTC (permalink / raw)
  To: Alex Gartrell; +Cc: wensong, horms, lvs-devel, kernel-team


	Hello,

On Mon, 23 Feb 2015, Alex Gartrell wrote:

> We have a custom scheduler that is failing some atomic allocations

	You can use GFP_KERNEL from add_dest method.

> sometimes, and it was made pretty hard to discover by the fact that we more
> or less drop the return code on the floor.  This is especially nasty, as
> the ipvsadm -ln invocation shows the scheduler to be as we wish it were
> instead of as it is.
> 
> This is an RFC because I have no idea what the best way to proceed with the
> upd_dest invocation is.  I could clone all of the state and simply restore
> it, but I figured one of you might have a better idea.

	Can a new create_dest(svc, bool for_update) method
solve such problem? Such modifications are serialized, so
create_dest can attach new structure to sched_data just before
the add_dest/upd_dest call. IPVS does not clone dests on
update but scheduler may need this for RCU update.
create_dest should be called early in __ip_vs_update_dest.
Note that kfree(dest) is not enough, see ip_vs_dest_free(),
more fields may be allocated before __ip_vs_update_dest is
reached.

Patch 2:
	dest_p is not needed, this patch is ok

Patch 3:
	In the __ip_vs_update_dest change you are missing
	that ip_vs_rs_hash was called. In fact, I think we
	should call __ip_vs_unlink_dest(svc, dest, 0) and
	__ip_vs_del_dest(svc->net, dest, false)
	from ip_vs_add_dest() when both __ip_vs_update_dest(, 1)
	and ip_vs_new_dest() fail. This should work because
	add_dest call is the last thing we do on ADD, so
	the DEL operation but without calling del_dest should
	be ok. Such change invalidates Patch 1.

	Why above solution is preferred:

	At first look, scheduler that can return error from add_dest
	should not provide this dest to RCU readers via its schedule
	method and kfree(dest) looks ok.

	The problem is with the ip_vs_has_real_service() call
	and the ip_vs_rs_hash() call we do before add_dest,
	dest still can be accessed by RCU readers.
	For this reason we can not just kfree(dest).

Patch 1:
	Not needed if patch 3 is changed

Patch 4:
	As for del_dest result, failure should not happen,
we should be able to flush dests on module cleanup. I think, there
should not be any allocations in this case.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values
  2015-02-24  7:30 ` [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Julian Anastasov
@ 2015-02-24  7:51   ` Alex Gartrell
  2015-02-24  9:17     ` Julian Anastasov
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Gartrell @ 2015-02-24  7:51 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: wensong, horms, lvs-devel, kernel-team

Hey Julian,

Thank you for your quick and thoughtful response :)

On 2/23/15 11:30 PM, Julian Anastasov wrote:
>> We have a custom scheduler that is failing some atomic allocations
>
> 	You can use GFP_KERNEL from add_dest method.
>

That should definitely improve things for us.

>
> 	Can a new create_dest(svc, bool for_update) method
> solve such problem? Such modifications are serialized, so
> create_dest can attach new structure to sched_data just before
> the add_dest/upd_dest call. IPVS does not clone dests on
> update but scheduler may need this for RCU update.
> create_dest should be called early in __ip_vs_update_dest.
> Note that kfree(dest) is not enough, see ip_vs_dest_free(),
> more fields may be allocated before __ip_vs_update_dest is
> reached.

More context on the custom scheduler:

It's a consistent hash ring algorithm which requires sorting all of the 
members and then inserting them one at a time (a entry per weight).  It 
requires a massive allocation to load balance to the number of machines 
we need to.

Right here I'm going to admit that this is not ideal.  This is somewhat 
"inherited," but I'm currently stuck solving this problem so I'll make 
the best of it :)

So now we're in this position where adds, deletes, and updates all can 
force us to recalculate this giant hash ring.  We can either simply lock 
the ring (but this actually takes long enough with a large number of 
real servers with large weights that it causes soft lock up detection!) 
or do RCU (a new allocation every time).  We're trying out the RCU thing 
now which is where this hurt us.

One aside, if it's safe to call synchronize_rcu in that path, then it 
may solve our most painful allocation problem in part.

tl;dr; it's an implementation in which you allocate on add, update, and 
delete.


> Patch 3:
> 	In the __ip_vs_update_dest change you are missing
> 	that ip_vs_rs_hash was called. In fact, I think we
> 	should call __ip_vs_unlink_dest(svc, dest, 0) and
> 	__ip_vs_del_dest(svc->net, dest, false)
> 	from ip_vs_add_dest() when both __ip_vs_update_dest(, 1)
> 	and ip_vs_new_dest() fail. This should work because
> 	add_dest call is the last thing we do on ADD, so
> 	the DEL operation but without calling del_dest should
> 	be ok. Such change invalidates Patch 1.
>
> 	Why above solution is preferred:
>
> 	At first look, scheduler that can return error from add_dest
> 	should not provide this dest to RCU readers via its schedule
> 	method and kfree(dest) looks ok.
>
> 	The problem is with the ip_vs_has_real_service() call
> 	and the ip_vs_rs_hash() call we do before add_dest,
> 	dest still can be accessed by RCU readers.
> 	For this reason we can not just kfree(dest).

Pending the rest of your reply I will reimplement it in this way.

>
> Patch 4:
> 	As for del_dest result, failure should not happen,
> we should be able to flush dests on module cleanup. I think, there
> should not be any allocations in this case.

As mentioned above, we may need to reallocate in this path.  But it does 
offend my sensibilities that this is the case, so I think it'd be fair 
to change the del_dest function to return void and we'd just find 
another way to do it.

Thanks,
-- 
Alex Gartrell <agartrell@fb.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values
  2015-02-24  7:51   ` Alex Gartrell
@ 2015-02-24  9:17     ` Julian Anastasov
  0 siblings, 0 replies; 8+ messages in thread
From: Julian Anastasov @ 2015-02-24  9:17 UTC (permalink / raw)
  To: Alex Gartrell; +Cc: wensong, horms, lvs-devel, kernel-team


	Hello,

On Mon, 23 Feb 2015, Alex Gartrell wrote:

> > 	Can a new create_dest(svc, bool for_update) method
> > solve such problem? Such modifications are serialized, so
> > create_dest can attach new structure to sched_data just before
> > the add_dest/upd_dest call. IPVS does not clone dests on
> > update but scheduler may need this for RCU update.
> > create_dest should be called early in __ip_vs_update_dest.
> > Note that kfree(dest) is not enough, see ip_vs_dest_free(),
> > more fields may be allocated before __ip_vs_update_dest is
> > reached.
> 
> More context on the custom scheduler:
> 
> It's a consistent hash ring algorithm which requires sorting all of the
> members and then inserting them one at a time (a entry per weight).  It
> requires a massive allocation to load balance to the number of machines we
> need to.

	The schedulers do not need RCU for list traversal
if they implement their own lists. But if dest is returned
from the schedule() method this dest is not freed while
in RCU grace period. And this logic is not part of the
scheduler. So, the scheduler may use some lock(s) which is
bad for SMP (most schedulers already use such lock) and
the only thing that they should care if their schedule
method uses RCU for their internal structures. Lock
is needed only as syncronization between schedule() and
the other methods that maintain internal structures.

> Right here I'm going to admit that this is not ideal.  This is somewhat
> "inherited," but I'm currently stuck solving this problem so I'll make the
> best of it :)
> 
> So now we're in this position where adds, deletes, and updates all can force
> us to recalculate this giant hash ring.  We can either simply lock the ring
> (but this actually takes long enough with a large number of real servers with
> large weights that it causes soft lock up detection!) or do RCU (a new
> allocation every time).  We're trying out the RCU thing now which is where
> this hurt us.
> 
> One aside, if it's safe to call synchronize_rcu in that path, then it may
> solve our most painful allocation problem in part.

	Yes, it is safe to use synchronize_rcu in
scheduler, except in schedule() method, of course.
IPVS holds just a mutex on configuration changes.
But synchronize_rcu should be avoided, it is always
possible to use another way.

> > Patch 4:
> > 	As for del_dest result, failure should not happen,
> > we should be able to flush dests on module cleanup. I think, there
> > should not be any allocations in this case.
> 
> As mentioned above, we may need to reallocate in this path.  But it does
> offend my sensibilities that this is the case, so I think it'd be fair to
> change the del_dest function to return void and we'd just find another way to
> do it.

	Looks like, patch 4 can be used, just
use 'dest->flags = old_flags;' before list_add_rcu.
Because the module removal will not call del_dest, the
done_service method called by ip_vs_unbind_scheduler()
is called when service is deleted. More details here:

- init_service:
	- can be called when service is edited, eg. when
	changing scheduler
	- we can see empty list or the list of all destinations
	before the scheduler is changed
- done_service:
	- sched_data must be freed after grace period and
	module exit should be delayed with synchronize_rcu
	to wait all RCU read-side critical sections to
	complete
	- can be called without calling del_dest when
	service is deleted
- add_dest:
	- after dest is linked
- del_dest:
	- after dest is unlinked
	- not called if service is deleted, in this case
	only done_service is called
- upd_dest:
	- after dest is edited (new weight)
- schedule:
	- called under RCU read lock, services are in list
	protected by RCU
	- needs _rcu if using svc->destinations
	- can call ip_vs_dest_hold
	- scheduler can hold dests in its state long after they are
	unlinked from svc, even without providing del_dest handler.
	- should check for IP_VS_DEST_F_AVAILABLE if
	using stale dest from sched_data structures
	- should use _bh suffix for sched_lock or other locks

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-02-24  9:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-24  5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 1/4] ipvs: move ip_vs_trash_put_dest into its own function Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 2/4] ipvs: drop dest_p from ip_vs_new_dest Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 3/4] ipvs: check return value of add_dest Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 4/4] ipvs: check return value of del_dest Alex Gartrell
2015-02-24  7:30 ` [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Julian Anastasov
2015-02-24  7:51   ` Alex Gartrell
2015-02-24  9:17     ` Julian Anastasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.