All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Ricardo Leitner <mleitner@redhat.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: lvs-devel@vger.kernel.org, hannes@redhat.com, jbrouer@redhat.com
Subject: Re: [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT
Date: Fri, 12 Dec 2014 09:58:21 -0200	[thread overview]
Message-ID: <548AD85D.1070109@redhat.com> (raw)
In-Reply-To: <alpine.LFD.2.11.1412112207350.2098@ja.home.ssi.bg>

Hi,

On 11-12-2014 20:39, Julian Anastasov wrote:
>
> 	Hello,
>
> On Thu, 11 Dec 2014, Marcelo Ricardo Leitner wrote:
>
>> On 10-12-2014 19:27, Julian Anastasov wrote:
>>>
>>>   int conn_reuse_mode;
>>>
>>>   conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
>>>   if (conn_reuse_mode && !iph.fragoffs &&
>>>       is_new_conn(skb, &iph) && cp &&
>>>       ...
>>>       unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>>
>> Looks great! Thanks for all the directions.
>>
>> One question: at this if(), shouldn't we also check for &cp->n_control or you
>> considered it in the '...'? Because as you said, if this connection controls
>> another one we won't expire it and AFAICT we may end up with two duplicate
>> entries for the given src host/port/service.
>>
>> Currently I'm with:
>>          if (conn_reuse_mode && !iph.fragoffs &&
>>              is_new_conn(skb, &iph) && cp &&
>>              !atomic_read(&cp->n_control) &&
>>              ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
>>               unlikely(!atomic_read(&cp->dest->weight))) ||
>>               unlikely(is_new_conn_expected(cp, conn_reuse_mode)))) {
>
> 	I think, the cp->n_control check is not needed,
> we can live with many connections... in master server.

Okay, good.

> 	But it looks like the new mechanism opens the door
> for problems with the sync protocol. The master will
> send sync messages for the new connection that we
> create but backup server will hit the old connection
> that can be to another real server (daddr). Backup will update
> the cp->flags and cp->state but cp->dest can not be changed.
> Our old (expired) connection will try to send sync messages
> but as its state does not change ip_vs_sync_conn_needed()
> will prevent the sending for FW/TW. Even if the old connection
> does not cause problems, the new connection is not properly synced,
> ip_vs_proc_conn() will call ip_vs_conn_in_get() and
> we do not notice that daddr/dport differs.
>
> 	May be it can be solved by adding daddr+dport
> comparison after the ip_vs_conn_in_get() call in
> ip_vs_proc_conn(), so that we can support multiple
> connections that differ in daddr+dport. Or even
> daddr+dport check in ip_vs_conn_in_get just for the backup.
> But even in this case there is a risk because the sync messages
> can come in random order, for example, a short connection
> is expired in TW state, we create new connection but
> the backup server gets sync messages in reverse order,
> i.e. first for the new connection.
>
> 	If above is not implemented, the
> '!(ipvs->sync_state & IP_VS_STATE_MASTER) &&' check
> should prevent the expiration. We will create new connection
> due to weight=0 but backup can remain with wrong daddr
> from the old connection, that is the price we pay on
> rescheduling to different real server.
>
> 	And may be we have to add also a cp->control check
> in is_new_conn_expected(), the reason: persistence
> should have priority. 'cp->control' check should be in
> is_new_conn_expected(). IIRC, people use expire_nodest_conn
> even for setups with persistence, so cp->control should
> not prevent rescheduling in this case. cp->control also covers
> the case with FTP DATA, such conns should use the same
> real server.
>
> 	What about:
>
> static inline bool can_expire_connection()
> {
> 	/* Controlled (FTP DATA or persistence)? */
> 	if (cp->control)
> 		return false;
> 	---> Below is the previous version, unchanged:
>          return (cp->state == IP_VS_TCP_S_TIME_WAIT) ||
>                 ((conn_reuse_mode & 2) &&
>                  cp->state == IP_VS_TCP_S_FIN_WAIT &&
>                  (cp->flags & IP_VS_CONN_F_NOOUTPUT));
> }
>
> 	New version:
>
>          conn_reuse_mode = sysctl_conn_reuse_mode(ipvs);
>          if (conn_reuse_mode && !iph.fragoffs &&
>              is_new_conn(skb, &iph) && cp &&
> 	    ((unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest &&
> 	      unlikely(!atomic_read(&cp->dest->weight))) ||
> 	     (!(ipvs->sync_state & IP_VS_STATE_MASTER) &&
> 	      unlikely(can_expire_connection(cp, conn_reuse_mode))))) {
>
> 	As result, actions by priority are as follows:
>
> - reschedule for expire_nodest_conn=1 when weight=0, even
> if daddr/dport can become different in backup, even on persistence
>
> - sync protocol is active: reuse to prevent conns with different
> real server in master and backup
>
> - reuse on persistence
>
> - reschedule on FW/TW: when no sync and no persistence
>
> - reuse, by default

TBH now I have to read more on the sync mechanism, it's been a while. I'll try 
going with the first option, a way to get this properly synced to backup 
server instead of disabling it when with sync enabled, because this last would 
put a severe limitation on this feature. I haven't seen many single node setups...

I'll get back to you by Tue I think.

Thanks,
Marcelo


  reply	other threads:[~2014-12-12 11:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 14:27 [PATCH RFC] ipvs: reschedule new connections if previous was on FIN_WAIT or TIME_WAIT Marcelo Ricardo Leitner
2014-12-08 14:29 ` Marcelo Ricardo Leitner
2014-12-09 23:37 ` Julian Anastasov
2014-12-10 12:34   ` Marcelo Ricardo Leitner
2014-12-10 16:56     ` Marcelo Ricardo Leitner
2014-12-10 21:27       ` Julian Anastasov
2014-12-11 12:44         ` Marcelo Ricardo Leitner
2014-12-11 22:39           ` Julian Anastasov
2014-12-12 11:58             ` Marcelo Ricardo Leitner [this message]
2014-12-30 19:49               ` Marcelo Ricardo Leitner
2015-01-05 22:26                 ` Julian Anastasov
2015-01-06 12:12                   ` Marcelo Ricardo Leitner
2015-01-06 21:06                     ` Julian Anastasov
2015-01-07 15:52                       ` Marcelo Ricardo Leitner
2015-01-07 19:31                         ` Julian Anastasov
2015-01-08 12:35                           ` Marcelo Ricardo Leitner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548AD85D.1070109@redhat.com \
    --to=mleitner@redhat.com \
    --cc=hannes@redhat.com \
    --cc=ja@ssi.bg \
    --cc=jbrouer@redhat.com \
    --cc=lvs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.