From: Alex Gartrell <agartrell@fb.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: wensong@linux-vs.org, horms@verge.net.au,
lvs-devel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values
Date: Mon, 23 Feb 2015 23:51:05 -0800 [thread overview]
Message-ID: <54EC2D69.5070300@fb.com> (raw)
In-Reply-To: <alpine.LFD.2.11.1502240914430.1755@ja.home.ssi.bg>
Hey Julian,
Thank you for your quick and thoughtful response :)
On 2/23/15 11:30 PM, Julian Anastasov wrote:
>> We have a custom scheduler that is failing some atomic allocations
>
> You can use GFP_KERNEL from add_dest method.
>
That should definitely improve things for us.
>
> Can a new create_dest(svc, bool for_update) method
> solve such problem? Such modifications are serialized, so
> create_dest can attach new structure to sched_data just before
> the add_dest/upd_dest call. IPVS does not clone dests on
> update but scheduler may need this for RCU update.
> create_dest should be called early in __ip_vs_update_dest.
> Note that kfree(dest) is not enough, see ip_vs_dest_free(),
> more fields may be allocated before __ip_vs_update_dest is
> reached.
More context on the custom scheduler:
It's a consistent hash ring algorithm which requires sorting all of the
members and then inserting them one at a time (a entry per weight). It
requires a massive allocation to load balance to the number of machines
we need to.
Right here I'm going to admit that this is not ideal. This is somewhat
"inherited," but I'm currently stuck solving this problem so I'll make
the best of it :)
So now we're in this position where adds, deletes, and updates all can
force us to recalculate this giant hash ring. We can either simply lock
the ring (but this actually takes long enough with a large number of
real servers with large weights that it causes soft lock up detection!)
or do RCU (a new allocation every time). We're trying out the RCU thing
now which is where this hurt us.
One aside, if it's safe to call synchronize_rcu in that path, then it
may solve our most painful allocation problem in part.
tl;dr; it's an implementation in which you allocate on add, update, and
delete.
> Patch 3:
> In the __ip_vs_update_dest change you are missing
> that ip_vs_rs_hash was called. In fact, I think we
> should call __ip_vs_unlink_dest(svc, dest, 0) and
> __ip_vs_del_dest(svc->net, dest, false)
> from ip_vs_add_dest() when both __ip_vs_update_dest(, 1)
> and ip_vs_new_dest() fail. This should work because
> add_dest call is the last thing we do on ADD, so
> the DEL operation but without calling del_dest should
> be ok. Such change invalidates Patch 1.
>
> Why above solution is preferred:
>
> At first look, scheduler that can return error from add_dest
> should not provide this dest to RCU readers via its schedule
> method and kfree(dest) looks ok.
>
> The problem is with the ip_vs_has_real_service() call
> and the ip_vs_rs_hash() call we do before add_dest,
> dest still can be accessed by RCU readers.
> For this reason we can not just kfree(dest).
Pending the rest of your reply I will reimplement it in this way.
>
> Patch 4:
> As for del_dest result, failure should not happen,
> we should be able to flush dests on module cleanup. I think, there
> should not be any allocations in this case.
As mentioned above, we may need to reallocate in this path. But it does
offend my sensibilities that this is the case, so I think it'd be fair
to change the del_dest function to return void and we'd just find
another way to do it.
Thanks,
--
Alex Gartrell <agartrell@fb.com>
next prev parent reply other threads:[~2015-02-24 7:51 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-24 5:07 [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Alex Gartrell
2015-02-24 5:07 ` [RFC PATCH net-next 1/4] ipvs: move ip_vs_trash_put_dest into its own function Alex Gartrell
2015-02-24 5:07 ` [RFC PATCH net-next 2/4] ipvs: drop dest_p from ip_vs_new_dest Alex Gartrell
2015-02-24 5:07 ` [RFC PATCH net-next 3/4] ipvs: check return value of add_dest Alex Gartrell
2015-02-24 5:07 ` [RFC PATCH net-next 4/4] ipvs: check return value of del_dest Alex Gartrell
2015-02-24 7:30 ` [RFC PATCH net-next 0/4] ipvs: check scheduler callback return values Julian Anastasov
2015-02-24 7:51 ` Alex Gartrell [this message]
2015-02-24 9:17 ` Julian Anastasov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54EC2D69.5070300@fb.com \
--to=agartrell@fb.com \
--cc=horms@verge.net.au \
--cc=ja@ssi.bg \
--cc=kernel-team@fb.com \
--cc=lvs-devel@vger.kernel.org \
--cc=wensong@linux-vs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.