From: Florian Westphal <fw@strlen.de>
To: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Cc: Florian Westphal <fw@strlen.de>,
Pablo Neira Ayuso <pablo@netfilter.org>,
netfilter@vger.kernel.org
Subject: Re: Most optimal method to dump UDP conntrack entries
Date: Fri, 18 Oct 2024 13:33:18 +0200 [thread overview]
Message-ID: <20241018113318.GA28324@breakpoint.cc> (raw)
In-Reply-To: <CABhP=tY=YnMPSMH0-T2oxHyE2Uzoy=RnHQmO4DRQ_Go8xzMGeg@mail.gmail.com>
Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> > The UDP tracker is old and refreshs the timeout for every packet, but
> > who says it needs to stay that way?
> >
> > With very simple patch to only refresh in 'reply' dir:
> >
> > [NEW] udp 17 30 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
> > [DESTROY] udp 17 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
> > [NEW] udp 17 30 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
> > [DESTROY] udp 17 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
> > [NEW] udp 17 30 src=127.0.0.1 dst=127.0.0.1 sport=49819 dport=12345 [UNREPLIED] src=127.0.0.1 dst=127.0.0.1 sport=12345 dport=49819
> >
> > This is with a unidirectional UDP stream.
> >
> > diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
> > --- a/net/netfilter/nf_conntrack_proto_udp.c
> > +++ b/net/netfilter/nf_conntrack_proto_udp.c
> > @@ -17,6 +17,7 @@
> > #include <linux/netfilter.h>
> > #include <linux/netfilter_ipv4.h>
> > #include <linux/netfilter_ipv6.h>
> > +#include <net/netfilter/nf_conntrack_acct.h>
> > #include <net/netfilter/nf_conntrack_l4proto.h>
> > #include <net/netfilter/nf_conntrack_ecache.h>
> > #include <net/netfilter/nf_conntrack_timeout.h>
> > @@ -80,6 +81,35 @@ static bool udp_error(struct sk_buff *skb,
> > return false;
> > }
> >
> > +static void nf_ct_refresh_udp(struct nf_conn *ct,
> > + enum ip_conntrack_dir dir,
> > + u32 len, u32 extra_jiffies)
> > +{
> > + /* Only update if this is not a fixed timeout */
> > + if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status))
> > + return nf_ct_acct_update(ct, dir, len);
> > +
> > + switch (dir) {
> > + case IP_CT_DIR_ORIGINAL:
> > + /* do not refresh in original dir, else stale dnat mapping
> > + * can be kept alive indefinitely.
> > + */
> > + if (nf_ct_is_confirmed(ct))
> > + return nf_ct_acct_update(ct, dir, len);
> > + break;
> > + case IP_CT_DIR_REPLY:
> > + extra_jiffies += nfct_time_stamp;
> > + break;
> > + case IP_CT_DIR_MAX: /* silence warning wrt. unhandled enum */
> > + break;
> > + }
> > +
> > + if (READ_ONCE(ct->timeout) != extra_jiffies)
> > + WRITE_ONCE(ct->timeout, extra_jiffies);
> > +
> > + nf_ct_acct_update(ct, dir, len);
> > +}
> > +
> > /* Returns verdict for packet, and may modify conntracktype */
> > int nf_conntrack_udp_packet(struct nf_conn *ct,
> > struct sk_buff *skb,
> > @@ -114,7 +144,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
> > stream = (status & IPS_ASSURED) == 0;
> > }
> >
> > - nf_ct_refresh_acct(ct, ctinfo, skb, extra);
> > + nf_ct_refresh_udp(ct, CTINFO2DIR(ctinfo), skb->len, extra);
> >
> > /* never set ASSURED for IPS_NAT_CLASH, they time out soon */
> > if (unlikely((status & IPS_NAT_CLASH)))
> > @@ -124,7 +154,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
> > if (stream && !test_and_set_bit(IPS_ASSURED_BIT, &ct->status))
> > nf_conntrack_event_cache(IPCT_ASSURED, ct);
> > } else {
> > - nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]);
> > + nf_ct_refresh_udp(ct, CTINFO2DIR(ctinfo), skb->len, timeouts[UDP_CT_UNREPLIED]);
> > }
> > return NF_ACCEPT;
> > }
> >
> > The obvious counter-argument is "but what if its the reply side that is
> > sending, and we have a stale SNAT mapping?".
> >
> > Solution would be to track timestamp of last packet seen per direction,
> > and then stop refreshing the timeout if one side ceases sending.
> >
> > I think thats much better than what we have now and there would be no
> > need to actively zap UDP flows, they would timeout automatically once
> > one side ceases to transmit.
>
> If I understand correctly this will have some visible effects, before
> this change the same tuple will only hit the same destination because
> the conntrack entry will prevail, and after this patch the packet will
> be spread out randomly to the number of backends.
Nothing changes unless flow is 100% uni-directional (no
return traffic OR only reply traffic after initial packet).
Old behavior:
Packet Direction
1 Origin # create entry with 30s timeout, save current time
2 Reply # flag entry assured (cannot be fast-removed
# if conntrack table is full), refresh timeout to 30s
3 Origin # refresh timeout to 30s OR 2 Minutes, if current time is 2s in the past
4 Origin # refresh timeout to 30s OR 2 Minutes, if current time is 2s in the past
...
5 Origin # refresh timeout to 30s OR 2 Minutes, if current time is 2s in the past
Thus, once DNAT mapping changes, conntrack entry remains alive for as
long as origin sends more packets.
After change:
Packet Direction
1 Origin # create entry with 30s timeout, save current time
2 Reply # flag entry assured (cannot be fast-removed
# if conntrack table is full), refresh timeout to 30s
3 Origin # do not refresh timeout
4 Origin # do not refresh timeout
...
5 Origin # do not refresh timeout
Once 30s have elapsed, this entry will expire, even if packets are seen.
Next packet will create a new conntrack, and a new NAT mapping could be
chosen.
If there is reply traffic, nothing changes:
Packet Direction
1 Origin # create entry with 30s timeout, save current time
2 Reply # flag entry assured (cannot be fast-removed
# if conntrack table is full), refresh timeout to 30s
3 Origin # do not refresh timeout
4 Reply # refresh timeout to 30s OR 2 Minutes, if current time is 2s in the past
...
5 Origin # do not refresh timeout
Thats the general idea anyway.
I'll try to make this more robust and also let it timeout if we have
only reply traffic without any origin packets.
> I'm aware of some requests to have this behavior in kubernetes for UDP
> service, but I'm scared that there can be some people relying on this
> behavior ... and all the possible edge cases this will open, what
> happens if we have two UDP packets in one direction before the first
> reply is seen?
Same as what happens now, 2nd packet follows NAT mapping of first one.
next prev parent reply other threads:[~2024-10-18 11:33 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-17 10:26 Most optimal method to dump UDP conntrack entries Antonio Ojea
2024-10-17 12:46 ` Florian Westphal
2024-10-17 16:36 ` Pablo Neira Ayuso
2024-10-17 22:10 ` Antonio Ojea
2024-10-17 23:30 ` Florian Westphal
2024-10-18 11:05 ` Antonio Ojea
2024-10-18 11:33 ` Florian Westphal [this message]
2024-10-18 14:10 ` Antonio Ojea
2024-10-21 13:53 ` Florian Westphal
2024-10-23 9:03 ` Benny Lyne Amorsen
2024-11-10 21:50 ` Florian Westphal
2024-11-11 6:33 ` Antonio Ojea
2024-11-11 12:06 ` Pablo Neira Ayuso
2024-11-11 12:09 ` Florian Westphal
2024-11-11 12:29 ` Pablo Neira Ayuso
2024-11-11 12:54 ` Florian Westphal
2024-11-12 9:16 ` Pablo Neira Ayuso
2024-11-12 9:20 ` Pablo Neira Ayuso
2024-11-12 14:41 ` Antonio Ojea
2024-11-12 14:43 ` Antonio Ojea
2024-11-12 16:18 ` Florian Westphal
2024-11-15 4:11 ` Antonio Ojea
2024-12-01 17:00 ` Antonio Ojea
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241018113318.GA28324@breakpoint.cc \
--to=fw@strlen.de \
--cc=antonio.ojea.garcia@gmail.com \
--cc=netfilter@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox