public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Ilya Maximets <i.maximets@ovn.org>, netdev@vger.kernel.org
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@resnulli.us>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
	Henrik Steen <henrist@henrist.net>,
	Olivier Tilmans <olivier.tilmans@nokia.com>,
	Bob Briscoe <research@bobbriscoe.net>,
	Olga Albisser <olga@albisser.org>,
	GangMin Kim <km.kim1503@gmail.com>,
	Eelco Chaudron <echaudro@redhat.com>,
	Aaron Conole <aconole@redhat.com>,
	Florian Westphal <fw@strlen.de>
Subject: Re: [PATCH net] net_sched: act_ct: drop all packets when not attached to ingress
Date: Tue, 17 Feb 2026 16:52:32 +0100	[thread overview]
Message-ID: <ecfc49ae-99c9-414e-8b8d-211112fc5873@redhat.com> (raw)
In-Reply-To: <cc6bfb4a-4a2b-42d8-b9ce-7ef6644fb22b@ovn.org>

Adding Florian, too

On 2/17/26 3:49 PM, Ilya Maximets wrote:
> On 2/17/26 10:38 AM, Paolo Abeni wrote:
>> Since the blamed commit below, classify can return TC_ACT_CONSUMED while
>> the current skb being held by the defragmentation engine. As reported by
>> GangMin Kim, if such packet is that may cause a UaF when the defrag engine
>> later on tries to tuch again such packet.
>>
>> act_ct was never meant to be used outside of the ingress path. Making
>> defrag really works for act_ct outside such constraints range from very
>> difficult to completely impossible.
>>
>> Address the issue making act_ct drop any packet when not attached to the
>> ingress path and additionally emit a warning about the bad
>> configuration.
>>
>> Reported-by: GangMin Kim <km.kim1503@gmail.com>
>> Fixes: 8f9516daedd6 ("sched: Add enqueue/dequeue of dualpi2 qdisc")
>> CC: stable@vger.kernel.org
>> Link: https://patch.msgid.link/16f6b264373ad60ab18eb8525809e7267442afa7.1770394932.git.pabeni@redhat.com
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> Catching the bad configuration at runtime instead of init time to reduce
>> complexity
>> ---
>>  net/sched/act_ct.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> index 81d488655793..e8eb0d195f4a 100644
>> --- a/net/sched/act_ct.c
>> +++ b/net/sched/act_ct.c
>> @@ -987,6 +987,11 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
>>  	tcf_lastuse_update(&c->tcf_tm);
>>  	tcf_action_update_bstats(&c->common, skb);
>>  
>> +	if (!skb_at_tc_ingress(skb)) {
>> +		pr_warn_once("act_CT should be attached at ingress!\n");
> 
> Hi, Paolo.  I didn't test this yet, but I'm a little concerned about this
> change.  For TC offload of OVS tunnels, we create egress qdisc for internal
> ports, a.k.a. bridge ports.  This is necessary for tunnels to work.
> A typical setup looks like this:
> 
>     Bridge br-int (10.0.0.1)
>         Port br-int
>             Interface br-int
>                 type: internal
>         Port vm
>             Interface vm
>         Port vxlan0
>             Interface vxlan0
>                 type: vxlan
>                 options: {remote_ip="172.31.1.1"}
> 
>     Bridge br-phy (172.31.1.100)
>         Port br-phy
>             Interface br-phy
>                 type: internal
>         Port eth0
>             Interface eth0
> 
> When a VM sends a packet that supposed to go through the tunnel, it enters
> from the vm port and OVS forwards it into vxlan0 LWT with an appropriate
> tunnel metadata.  Then it gets encapsulated and put into kernel routing to
> find the next hop, which is via br-phy bridge.  Packet enters br-phy and
> OVS forwards the packet to eth0.  There can be stateful processing in the
> br-phy bridge involving passing packets through conntrack.
> 
> With the TC offload enabled, OVS creates following filters:
> 
> 1. Ingress filter on vm interface that forwards packets to vxlan0.
> 2. Egress filter on br-phy interface that forwards encapsulated packets
>    to eth0 interface.
> 
> If some stateful processing is involved, both of those could have act_ct
> in them.
> 
> AFAIU, we have to use Egress qdisc for the br-phy, because the packet is
> egressing from the kernel routing via br-phy and the ingress qdisc is not
> invoked.  Ingress will be at play when packets are flowing in the opposite
> direction from eth0 to br-phy, as that's where they will ingress into
> standard kernel routing.
> 
> If act_tc will not be allowed on Egress, then stateful processing will
> not be possible in this case in br-phy bridge.
> 
> Thoughts?

This looks very problematic. A slightly different patch tried to
somewhat preserve functionality, but it simply can't work in presence of
IP fragments.

Why stateful processing on  br-int/port vm ingress is not enough?

/P


  reply	other threads:[~2026-02-17 15:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-17  9:38 [PATCH net] net_sched: act_ct: drop all packets when not attached to ingress Paolo Abeni
2026-02-17 10:42 ` Paolo Abeni
2026-02-17 14:49 ` Ilya Maximets
2026-02-17 15:52   ` Paolo Abeni [this message]
2026-02-17 19:37     ` Ilya Maximets
2026-02-18 14:28       ` Jamal Hadi Salim
2026-02-18 16:15         ` Ilya Maximets
2026-02-18 18:31           ` Jamal Hadi Salim
2026-02-18 18:44             ` Jamal Hadi Salim
2026-02-18 20:43               ` Paolo Abeni
2026-02-19 11:46                 ` Ilya Maximets
2026-02-19 14:16                 ` Jamal Hadi Salim
2026-02-19 20:13                   ` Jamal Hadi Salim
2026-02-20 12:24                     ` Victor Nogueira
2026-02-20 13:41                       ` Ilya Maximets
2026-02-20 16:12                         ` Victor Nogueira
2026-02-17 15:28 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecfc49ae-99c9-414e-8b8d-211112fc5873@redhat.com \
    --to=pabeni@redhat.com \
    --cc=aconole@redhat.com \
    --cc=davem@davemloft.net \
    --cc=echaudro@redhat.com \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=henrist@henrist.net \
    --cc=horms@kernel.org \
    --cc=i.maximets@ovn.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=km.kim1503@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olga@albisser.org \
    --cc=olivier.tilmans@nokia.com \
    --cc=research@bobbriscoe.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox