From: Alexey Andriyanov <alan@al-an.info>
To: Alex Gartrell <agartrell@fb.com>, lvs-devel@vger.kernel.org
Cc: dsp@fb.com, kernel-team <Kernel-team@fb.com>, ps@fb.com
Subject: Re: IPVS Health Checking Best Practices
Date: Fri, 19 Sep 2014 08:31:05 +0400 [thread overview]
Message-ID: <541BB189.6050901@al-an.info> (raw)
In-Reply-To: <541B4E22.6080802@fb.com>
Hi, Alex.
We use tunnel checks to the host itself. The encapsulated packet looks like this:
CHK_SRC -> RS1 [ (IPIP) CHK_SRC -> VIP1 ].
This could be done without maintaining iptables rules or tunnel interfaces.
You simply apply fwmark corresponding to a proper RS to checker socket via SO_MARK. Then direct all marked packets in OUTPUT to the NFQUEUE, and encapsulate a packet in user-space, selecting tunnel endpoint based on fwmark.
We use keepalived + this little tool for that: https://github.com/andriyanov/check-tun
19.09.2014 01:26, Alex Gartrell wrote:
> Hello All,
>
> Today, we run IPVS on a number of hosts. Each of these hosts has a python process responsible for ensuring the health of pool members and then updating their weights as necessary.
>
> We do these health checks via IPVS for two reasons:
> 1) Different VIPs have different listeners on our real servers, so we can't just use the regular host address
> 2) We want to ensure that decapsulation is happening appropriately.
>
> The way we do this today is a giant hack. We have a scheduler that we've not (yet) open sourced that does consistent hashing, and someone just wired in a couple additional sysctls that will allow you to do the following:
>
> If a request is from $MAGIC_IP and the source port is >= $MAGIC_PORT, then send it to pool->members[($SRC_PORT - $MAGIC_PORT) % $N].
>
> I'd like to solve this problem more generally.
>
> The other solution I've heard of is using fwmarks, but that kind of sucks from a configuration perspective (because you have to add in all of the persistent vips and everything).
>
> Here are some other ideas:
>
> 1) Map the socket itself to a particular pool with a netlink invocation or something
>
> 2) Provide a way to bind specific src addr, port tuples to specific destination (though this is a bummer because you have to reserve port space)
>
> But I'm completely open to ideas and I think we're willing to do the work to make this happen.
>
>
> Thanks,
>
--
Best regards,
Alexey
prev parent reply other threads:[~2014-09-19 4:31 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-18 21:26 IPVS Health Checking Best Practices Alex Gartrell
2014-09-19 4:31 ` Alexey Andriyanov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=541BB189.6050901@al-an.info \
--to=alan@al-an.info \
--cc=Kernel-team@fb.com \
--cc=agartrell@fb.com \
--cc=dsp@fb.com \
--cc=lvs-devel@vger.kernel.org \
--cc=ps@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.