Linux Netfilter discussions
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Florian Westphal <fw@strlen.de>
Cc: Antonio Ojea <antonio.ojea.garcia@gmail.com>, netfilter@vger.kernel.org
Subject: Re: Most optimal method to dump UDP conntrack entries
Date: Thu, 17 Oct 2024 18:36:20 +0200	[thread overview]
Message-ID: <ZxE8-pDn9Alzm50K@calendula> (raw)
In-Reply-To: <20241017124632.GC12005@breakpoint.cc>

On Thu, Oct 17, 2024 at 02:46:32PM +0200, Florian Westphal wrote:
> Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> > In the context of Kubernetes, when DNATing entries for UDP Services,
> > we need to deal with some edge cases where some UDP entries are left
> > orphaned but blackhole the traffic to the new endpoints.
> > 
> > At high level, the scenario is:
> > - Client IP_A sends UDP traffic to VirtualIP IP_B --> Kubernetes
> > Translates this to Endpoint IP_C
> > - Endpoint IP_C is replaced by Endpoint IP_D, but since Client IP_A
> > does not stop sending traffic, the conntrack entry IP_A IP_B --> IP_C
> > takes precedence and is being renewed, so traffic is not sent to the
> > new Endpoint IP_D and is lost.
> > 
> > To solve this problem, we have some heuristics to detect those
> > scenarios when the endpoints change and flush the conntrack entries,
> > however, since this is event based, if we lost the event that
> > triggered the problem or something happens that fails to clean up the
> > entry,  the user need to manually flush the entries.
> > 
> > We are implementing a new approach to solve this, we list all the UDP
> > conntrack entries using netlink, compare against the existing
> > programmed nftables/iptables rules, and flush the ones we know are
> > stale.
> > 
> > During the implementation review, the question [1] this raises is, how
> > impactful is it to dump all the conntrack entries each time we program
> > the iptables/nftables rules (this can be every 1s on nodes with a lot
> > of entries)?
> > Is this approach completely safe?
> > Should we try to read from procfs instead?
> 
> Walking all conntrack entries in 1s intervals is going to be slow, no
> matter the chosen interface.  Even doing the filtering in the kernel to
> not dump all entries but only those that match udp/port/ip criteria is
> not going to change it.
> 
> Also both proc and netlink dumps can miss entries (albeit its rare),
> if parallel insertions/deletes happen (which is normal on busy system).
> 
> I wonder why the appropriate delete requests cannot be done when the
> mapping is altered, I mean, you must have some code that issues
> either iptables -t nat -D ... or nft delete element ... or similar.
> 
> If you do that, why not also fire off the conntrack -D request
> afterwards?  Or are these publish/withdraw so frequent that this
> doesn't matter compared to poll based approach?
> 
> Something like
>    conntrack -D --protonum 17 --orig-dst $vserver --orig-port-dst 53 --reply-src $rserver --reply-port-src 5353
> 
> would zap everything to $rserver mapped to $vserver from client point of view.

This reminds me, it would be good to expand conntrack utility to use
the new kernel API to filter from kernel + delete.

I will try to get here.

  reply	other threads:[~2024-10-17 16:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-17 10:26 Most optimal method to dump UDP conntrack entries Antonio Ojea
2024-10-17 12:46 ` Florian Westphal
2024-10-17 16:36   ` Pablo Neira Ayuso [this message]
2024-10-17 22:10     ` Antonio Ojea
2024-10-17 23:30       ` Florian Westphal
2024-10-18 11:05         ` Antonio Ojea
2024-10-18 11:33           ` Florian Westphal
2024-10-18 14:10             ` Antonio Ojea
2024-10-21 13:53               ` Florian Westphal
2024-10-23  9:03                 ` Benny Lyne Amorsen
2024-11-10 21:50                 ` Florian Westphal
2024-11-11  6:33                   ` Antonio Ojea
2024-11-11 12:06       ` Pablo Neira Ayuso
2024-11-11 12:09         ` Florian Westphal
2024-11-11 12:29           ` Pablo Neira Ayuso
2024-11-11 12:54             ` Florian Westphal
2024-11-12  9:16               ` Pablo Neira Ayuso
2024-11-12  9:20                 ` Pablo Neira Ayuso
2024-11-12 14:41                   ` Antonio Ojea
2024-11-12 14:43                     ` Antonio Ojea
2024-11-12 16:18                     ` Florian Westphal
2024-11-15  4:11                       ` Antonio Ojea
2024-12-01 17:00                         ` Antonio Ojea

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZxE8-pDn9Alzm50K@calendula \
    --to=pablo@netfilter.org \
    --cc=antonio.ojea.garcia@gmail.com \
    --cc=fw@strlen.de \
    --cc=netfilter@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox