Re: Avoid race between tcp_packet packet processing and timeout set by a netfilter CTA_TIMEOUT message

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Tula Kraiser <tkraiser@arista.com>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: Avoid race between tcp_packet packet processing and timeout set by a netfilter CTA_TIMEOUT message
Date: Fri, 18 Nov 2022 16:13:53 +0100	[thread overview]
Message-ID: <Y3ehMVHetV5Vx7R3@salvia> (raw)
In-Reply-To: <CAKh0D7xP9rmwes4zjwDAYvrB706Au3aLvfA25NV0+sYR17+-NQ@mail.gmail.com>

Hi,

On Thu, Nov 10, 2022 at 10:03:43AM -0800, Tula Kraiser wrote:
> Hello,
> 
> We have been using the nat netfilter module to create NAT translations
> and then we offload the translations to our hardware. Once the
> translation is offloaded to hardware we expect only FIN and RST to be
> received by the linux stack. Once we finish programming the hardware
> we send a NETLINK message to the kernel setting the entry timeout to a
> larger value (we use the CTA_TIMEOUT for that). That's because we rely
> on hardware hitbit to indicate when the entry should be removed due to
> inactivity.

This sounds very much like the flowtable infrastructure [1], the TCP
FIN and RST handling is done in a similar way.

> Unfortunately there is a delay between receiving the notification of
> the translation (we subscribe to netfilter conntrack events for that)
> and the time we program the hardware where packets still make it into
> the kernel input queue. There is a race between the CTA_TIMEOUT
> message and the queue packets where the kernel can replace the timeout
> with its default values leading to the entry being removed
> prematurely.
> 
> 
> To avoid that we are proposing introducing a new attribute to the
> CTA_PROTOINFO for TCP where we set the IPS_FIXED_TIMEOUT_FLAG on the
> conntrack entry if the conntrack TCP state is less or equal to
> TCP_ESTABLISHED.  That takes care of the race.  We are modifying the
> tcp_packet routine to reset the IPS_FIXED_TIMEOUT_FLAG when the tcp
> state moves the established state so FINs can be processed correctly.
> 
> 
> Does this sound like a reasonable solution to the problem or are there
> better suggestions? Does this sound like an interesting patch to push
> upstream?

There are already several drivers that support hardware offload
through this infrastructure.

Someone contributed more detailed documentation about the flowtable [2].

Code can be found under net/netfilter/nf_flow_table*.c files.

[1] https://docs.kernel.org/networking/nf_flowtable.html
[2] https://thermalcircle.de/doku.php?id=blog:linux:flowtables_1_a_netfilter_nftables_fastpath

     prev parent reply	other threads:[~2022-11-18 15:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-10 18:03 Avoid race between tcp_packet packet processing and timeout set by a netfilter CTA_TIMEOUT message Tula Kraiser
2022-11-18 15:13 ` Pablo Neira Ayuso [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3ehMVHetV5Vx7R3@salvia \
    --to=pablo@netfilter.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=tkraiser@arista.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).