All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Gartrell <agartrell@fb.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: wensong@linux-vs.org, horms@verge.net.au,
	lvs-devel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values
Date: Mon, 23 Feb 2015 23:51:05 -0800	[thread overview]
Message-ID: <54EC2D69.5070300@fb.com> (raw)
In-Reply-To: <alpine.LFD.2.11.1502240914430.1755@ja.home.ssi.bg>

Hey Julian,

Thank you for your quick and thoughtful response :)

On 2/23/15 11:30 PM, Julian Anastasov wrote:
>> We have a custom scheduler that is failing some atomic allocations
>
> 	You can use GFP_KERNEL from add_dest method.
>

That should definitely improve things for us.

>
> 	Can a new create_dest(svc, bool for_update) method
> solve such problem? Such modifications are serialized, so
> create_dest can attach new structure to sched_data just before
> the add_dest/upd_dest call. IPVS does not clone dests on
> update but scheduler may need this for RCU update.
> create_dest should be called early in __ip_vs_update_dest.
> Note that kfree(dest) is not enough, see ip_vs_dest_free(),
> more fields may be allocated before __ip_vs_update_dest is
> reached.

More context on the custom scheduler:

It's a consistent hash ring algorithm which requires sorting all of the 
members and then inserting them one at a time (a entry per weight).  It 
requires a massive allocation to load balance to the number of machines 
we need to.

Right here I'm going to admit that this is not ideal.  This is somewhat 
"inherited," but I'm currently stuck solving this problem so I'll make 
the best of it :)

So now we're in this position where adds, deletes, and updates all can 
force us to recalculate this giant hash ring.  We can either simply lock 
the ring (but this actually takes long enough with a large number of 
real servers with large weights that it causes soft lock up detection!) 
or do RCU (a new allocation every time).  We're trying out the RCU thing 
now which is where this hurt us.

One aside, if it's safe to call synchronize_rcu in that path, then it 
may solve our most painful allocation problem in part.

tl;dr; it's an implementation in which you allocate on add, update, and 
delete.


> Patch 3:
> 	In the __ip_vs_update_dest change you are missing
> 	that ip_vs_rs_hash was called. In fact, I think we
> 	should call __ip_vs_unlink_dest(svc, dest, 0) and
> 	__ip_vs_del_dest(svc->net, dest, false)
> 	from ip_vs_add_dest() when both __ip_vs_update_dest(, 1)
> 	and ip_vs_new_dest() fail. This should work because
> 	add_dest call is the last thing we do on ADD, so
> 	the DEL operation but without calling del_dest should
> 	be ok. Such change invalidates Patch 1.
>
> 	Why above solution is preferred:
>
> 	At first look, scheduler that can return error from add_dest
> 	should not provide this dest to RCU readers via its schedule
> 	method and kfree(dest) looks ok.
>
> 	The problem is with the ip_vs_has_real_service() call
> 	and the ip_vs_rs_hash() call we do before add_dest,
> 	dest still can be accessed by RCU readers.
> 	For this reason we can not just kfree(dest).

Pending the rest of your reply I will reimplement it in this way.

>
> Patch 4:
> 	As for del_dest result, failure should not happen,
> we should be able to flush dests on module cleanup. I think, there
> should not be any allocations in this case.

As mentioned above, we may need to reallocate in this path.  But it does 
offend my sensibilities that this is the case, so I think it'd be fair 
to change the del_dest function to return void and we'd just find 
another way to do it.

Thanks,
-- 
Alex Gartrell <agartrell@fb.com>

  reply	other threads:[~2015-02-24  7:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24  5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 1/4] ipvs: move ip_vs_trash_put_dest into its own function Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 2/4] ipvs: drop dest_p from ip_vs_new_dest Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 3/4] ipvs: check return value of add_dest Alex Gartrell
2015-02-24  5:07 ` [RFC PATCH net-next 4/4] ipvs: check return value of del_dest Alex Gartrell
2015-02-24  7:30 ` [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Julian Anastasov
2015-02-24  7:51   ` Alex Gartrell [this message]
2015-02-24  9:17     ` Julian Anastasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54EC2D69.5070300@fb.com \
    --to=agartrell@fb.com \
    --cc=horms@verge.net.au \
    --cc=ja@ssi.bg \
    --cc=kernel-team@fb.com \
    --cc=lvs-devel@vger.kernel.org \
    --cc=wensong@linux-vs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.