From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Simon Horman <horms@verge.net.au>
Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
netfilter-devel@vger.kernel.org,
Wensong Zhang <wensong@linux-vs.org>,
Julian Anastasov <ja@ssi.bg>
Subject: Re: [PATCH 5/9] ipvs: use adaptive pause in master thread
Date: Mon, 2 Apr 2012 13:11:45 +0200 [thread overview]
Message-ID: <20120402111145.GA15181@1984> (raw)
In-Reply-To: <1332320185-27157-6-git-send-email-horms@verge.net.au>
Hi Simon et al.
On Wed, Mar 21, 2012 at 05:56:20PM +0900, Simon Horman wrote:
> From: Julian Anastasov <ja@ssi.bg>
>
> High rate of sync messages in master can lead to
> overflowing the socket buffer and dropping the messages.
> Instead of using fixed pause try to adapt to the sync
> rate, so that we do not wakeup too often while at the same
> time staying below the socket buffer limit.
I see. You are avoiding the congestion in the socket queue by putting
the pressure in your sync_buff queue (I don't find any limit in
the code for the amount of memory that it may consume btw).
Please, correct me if I'm wrong, but from this I can se you're
assuming that there's always room in the syn_buff queue to store
messages. This is valid if the high peak of traffic lasts for short
time. However, if it lasts long, the worker thread may not be able to
consume all the messages in time under high stress situation that
lasts long and the sync_buffer may keep growing more and more.
Moreover, the backup may receive sync messages talking about old
states that are not useful anymore to recover the load-balancing in
case of failover.
One more concern, please see below.
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
> net/netfilter/ipvs/ip_vs_sync.c | 60 +++++++++++++++++++++++++++++---------
> 1 files changed, 46 insertions(+), 14 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
> index 0e36679..9201c43 100644
> --- a/net/netfilter/ipvs/ip_vs_sync.c
> +++ b/net/netfilter/ipvs/ip_vs_sync.c
> @@ -1392,18 +1392,22 @@ ip_vs_send_async(struct socket *sock, const char *buffer, const size_t length)
> return len;
> }
>
> -static void
> +static int
> ip_vs_send_sync_msg(struct socket *sock, struct ip_vs_sync_mesg *msg)
> {
> int msize;
> + int ret;
>
> msize = msg->size;
>
> /* Put size in network byte order */
> msg->size = htons(msg->size);
>
> - if (ip_vs_send_async(sock, (char *)msg, msize) != msize)
> - pr_err("ip_vs_send_async error\n");
> + ret = ip_vs_send_async(sock, (char *)msg, msize);
> + if (ret >= 0 || ret == -EAGAIN)
> + return ret;
> + pr_err("ip_vs_send_async error %d\n", ret);
> + return 0;
> }
>
> static int
> @@ -1428,33 +1432,61 @@ ip_vs_receive(struct socket *sock, char *buffer, const size_t buflen)
> return len;
> }
>
> +/* Get next buffer to send */
> +static inline struct ip_vs_sync_buff *
> +next_sync_buff(struct netns_ipvs *ipvs)
> +{
> + struct ip_vs_sync_buff *sb;
> +
> + sb = sb_dequeue(ipvs);
> + if (sb)
> + return sb;
> + /* Do not delay entries in buffer for more than 2 seconds */
> + return get_curr_sync_buff(ipvs, 2 * HZ);
> +}
>
> static int sync_thread_master(void *data)
> {
> struct ip_vs_sync_thread_data *tinfo = data;
> struct netns_ipvs *ipvs = net_ipvs(tinfo->net);
> - struct ip_vs_sync_buff *sb;
> + struct ip_vs_sync_buff *sb = NULL;
> + int pause = HZ / 10;
>
> pr_info("sync thread started: state = MASTER, mcast_ifn = %s, "
> "syncid = %d\n",
> ipvs->master_mcast_ifn, ipvs->master_syncid);
>
> while (!kthread_should_stop()) {
> - while ((sb = sb_dequeue(ipvs))) {
> - ip_vs_send_sync_msg(tinfo->sock, sb->mesg);
> - ip_vs_sync_buff_release(sb);
> - }
> + int count = 0;
>
> - /* check if entries stay in ipvs->sync_buff for 2 seconds */
> - sb = get_curr_sync_buff(ipvs, 2 * HZ);
> - if (sb) {
> - ip_vs_send_sync_msg(tinfo->sock, sb->mesg);
> - ip_vs_sync_buff_release(sb);
> + for (;;) {
> + if (!sb) {
> + sb = next_sync_buff(ipvs);
> + if (!sb)
> + break;
> + }
> + if (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) >= 0 ||
> + pause == 1) {
> + ip_vs_sync_buff_release(sb);
> + sb = NULL;
> + count++;
> + } else {
> + pause = max(1, (pause >> 1));
> + break;
> + }
> }
>
> - schedule_timeout_interruptible(HZ);
> + /* Max packets to send at once */
> + if (count > 200)
> + pause = max(1, (pause - HZ / 20));
> + else if (count < 20)
> + pause = min(HZ / 4, (pause + HZ / 20));
> + schedule_timeout_interruptible(sb ? 1 : pause);
This rate-limits the amount of times the worker is woken up.
Still, this seems to me like an ad-hoc congestion solution. There's no
justification why those numbers has been chosed, eg. why do we assume
that we're congested if we've reached 200 packets in one single loop?
To me, congestion control is a complicated thing (there is plenty of
literature for TCP avoid it). I'm not sure how many patches will
follow after this one to try to improve your congestion control
solution.
So, my question is, are you sure you want to enter this domain?
IMO, better to stick to some simple solution, ie. just drop messages
if the worker thread receives a high peak of work, than trying to
define some sort of rate-limit solution based on assumptions that are
not generic enough for every setup.
next prev parent reply other threads:[~2012-04-02 11:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-21 8:56 [GIT PULL nf-next] IPVS Simon Horman
2012-03-21 8:56 ` [PATCH 1/9] ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server Simon Horman
2012-03-21 8:56 ` [PATCH 2/9] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest Simon Horman
2012-03-21 8:56 ` [PATCH 3/9] ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter Simon Horman
2012-03-21 8:56 ` [PATCH 4/9] ipvs: always update some of the flags bits in backup Simon Horman
2012-03-21 8:56 ` [PATCH 5/9] ipvs: use adaptive pause in master thread Simon Horman
2012-04-02 11:11 ` Pablo Neira Ayuso [this message]
2012-04-03 21:16 ` Julian Anastasov
2012-04-05 15:05 ` Pablo Neira Ayuso
2012-04-08 20:12 ` Julian Anastasov
2012-04-09 23:08 ` Pablo Neira Ayuso
2012-04-11 20:02 ` Julian Anastasov
2012-04-12 0:13 ` Pablo Neira Ayuso
2012-04-19 22:51 ` Julian Anastasov
2012-03-21 8:56 ` [PATCH 6/9] ipvs: reduce sync rate with time thresholds Simon Horman
2012-03-21 8:56 ` [PATCH 7/9] ipvs: add support for sync threads Simon Horman
2012-03-21 8:56 ` [PATCH 8/9] ipvs: optimize the use of flags in ip_vs_bind_dest Simon Horman
2012-03-21 8:56 ` [PATCH 9/9] ipvs: Provide a generic ip_vs_bind_xmit() Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120402111145.GA15181@1984 \
--to=pablo@netfilter.org \
--cc=horms@verge.net.au \
--cc=ja@ssi.bg \
--cc=lvs-devel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=wensong@linux-vs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).