[PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
@ 2025-08-22  6:42 Yafang Shao
  2025-08-22  7:26 ` Nikolay Aleksandrov
  2025-08-28  7:55 ` Paolo Abeni
  0 siblings, 2 replies; 7+ messages in thread
From: Yafang Shao @ 2025-08-22  6:42 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms, daniel, bigeasy, tgraf,
	paulmck
  Cc: netdev, bpf, Yafang Shao

During recent testing with the netem qdisc to inject delays into TCP
traffic, we observed that our CLS BPF program failed to function correctly
due to incorrect classid retrieval from task_get_classid(). The issue
manifests in the following call stack:

        bpf_get_cgroup_classid+5
        cls_bpf_classify+507
        __tcf_classify+90
        tcf_classify+217
        __dev_queue_xmit+798
        bond_dev_queue_xmit+43
        __bond_start_xmit+211
        bond_start_xmit+70
        dev_hard_start_xmit+142
        sch_direct_xmit+161
        __qdisc_run+102             <<<<< Issue location
        __dev_xmit_skb+1015
        __dev_queue_xmit+637
        neigh_hh_output+159
        ip_finish_output2+461
        __ip_finish_output+183
        ip_finish_output+41
        ip_output+120
        ip_local_out+94
        __ip_queue_xmit+394
        ip_queue_xmit+21
        __tcp_transmit_skb+2169
        tcp_write_xmit+959
        __tcp_push_pending_frames+55
        tcp_push+264
        tcp_sendmsg_locked+661
        tcp_sendmsg+45
        inet_sendmsg+67
        sock_sendmsg+98
        sock_write_iter+147
        vfs_write+786
        ksys_write+181
        __x64_sys_write+25
        do_syscall_64+56
        entry_SYSCALL_64_after_hwframe+100

The problem occurs when multiple tasks share a single qdisc. In such cases,
__qdisc_run() may transmit skbs created by different tasks. Consequently,
task_get_classid() retrieves an incorrect classid since it references the
current task's context rather than the skb's originating task.

Given that dev_queue_xmit() always executes with bh disabled, we can safely
use in_softirq() instead of in_serving_softirq() to properly identify the
softirq context and obtain the correct classid.

The simple steps to reproduce this issue:
1. Add network delay to the network interface:
  such as: tc qdisc add dev bond0 root netem delay 1.5ms
2. Create two distinct net_cls cgroups, each running a network-intensive task
3. Initiate parallel TCP streams from both tasks to external servers.

Under this specific condition, the issue reliably occurs. The kernel
eventually dequeues an SKB that originated from Task-A while executing in
the context of Task-B.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

---

v1->v2: use softirq_count() instead of in_softirq()
---
 include/net/cls_cgroup.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index 7e78e7d6f015..668aeee9b3f6 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb)
 	 * calls by looking at the number of nested bh disable calls because
 	 * softirqs always disables bh.
 	 */
-	if (in_serving_softirq()) {
+	if (softirq_count()) {
 		struct sock *sk = skb_to_full_sk(skb);

 		/* If there is an sock_cgroup_classid we'll use that. */
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
  2025-08-22  6:42 [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run Yafang Shao
@ 2025-08-22  7:26 ` Nikolay Aleksandrov
  2025-08-22  7:34   ` Yafang Shao
  2025-08-28  7:55 ` Paolo Abeni
  1 sibling, 1 reply; 7+ messages in thread
From: Nikolay Aleksandrov @ 2025-08-22  7:26 UTC (permalink / raw)
  To: Yafang Shao, davem, edumazet, kuba, pabeni, horms, daniel,
	bigeasy, tgraf, paulmck
  Cc: netdev, bpf

On 8/22/25 09:42, Yafang Shao wrote:
> During recent testing with the netem qdisc to inject delays into TCP
> traffic, we observed that our CLS BPF program failed to function correctly
> due to incorrect classid retrieval from task_get_classid(). The issue
> manifests in the following call stack:
> 
>         bpf_get_cgroup_classid+5
>         cls_bpf_classify+507
>         __tcf_classify+90
>         tcf_classify+217
>         __dev_queue_xmit+798
>         bond_dev_queue_xmit+43
>         __bond_start_xmit+211
>         bond_start_xmit+70
>         dev_hard_start_xmit+142
>         sch_direct_xmit+161
>         __qdisc_run+102             <<<<< Issue location
>         __dev_xmit_skb+1015
>         __dev_queue_xmit+637
>         neigh_hh_output+159
>         ip_finish_output2+461
>         __ip_finish_output+183
>         ip_finish_output+41
>         ip_output+120
>         ip_local_out+94
>         __ip_queue_xmit+394
>         ip_queue_xmit+21
>         __tcp_transmit_skb+2169
>         tcp_write_xmit+959
>         __tcp_push_pending_frames+55
>         tcp_push+264
>         tcp_sendmsg_locked+661
>         tcp_sendmsg+45
>         inet_sendmsg+67
>         sock_sendmsg+98
>         sock_write_iter+147
>         vfs_write+786
>         ksys_write+181
>         __x64_sys_write+25
>         do_syscall_64+56
>         entry_SYSCALL_64_after_hwframe+100
> 
> The problem occurs when multiple tasks share a single qdisc. In such cases,
> __qdisc_run() may transmit skbs created by different tasks. Consequently,
> task_get_classid() retrieves an incorrect classid since it references the
> current task's context rather than the skb's originating task.
> 
> Given that dev_queue_xmit() always executes with bh disabled, we can safely
> use in_softirq() instead of in_serving_softirq() to properly identify the
> softirq context and obtain the correct classid.
>
 
nit: you are no longer using in_softirq() in v2, you should update the
commit message as well.

[snip]

Cheers,
 Nik


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
  2025-08-22  7:26 ` Nikolay Aleksandrov
@ 2025-08-22  7:34   ` Yafang Shao
  0 siblings, 0 replies; 7+ messages in thread
From: Yafang Shao @ 2025-08-22  7:34 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: davem, edumazet, kuba, pabeni, horms, daniel, bigeasy, tgraf,
	paulmck, netdev, bpf

On Fri, Aug 22, 2025 at 3:26 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 8/22/25 09:42, Yafang Shao wrote:
> > During recent testing with the netem qdisc to inject delays into TCP
> > traffic, we observed that our CLS BPF program failed to function correctly
> > due to incorrect classid retrieval from task_get_classid(). The issue
> > manifests in the following call stack:
> >
> >         bpf_get_cgroup_classid+5
> >         cls_bpf_classify+507
> >         __tcf_classify+90
> >         tcf_classify+217
> >         __dev_queue_xmit+798
> >         bond_dev_queue_xmit+43
> >         __bond_start_xmit+211
> >         bond_start_xmit+70
> >         dev_hard_start_xmit+142
> >         sch_direct_xmit+161
> >         __qdisc_run+102             <<<<< Issue location
> >         __dev_xmit_skb+1015
> >         __dev_queue_xmit+637
> >         neigh_hh_output+159
> >         ip_finish_output2+461
> >         __ip_finish_output+183
> >         ip_finish_output+41
> >         ip_output+120
> >         ip_local_out+94
> >         __ip_queue_xmit+394
> >         ip_queue_xmit+21
> >         __tcp_transmit_skb+2169
> >         tcp_write_xmit+959
> >         __tcp_push_pending_frames+55
> >         tcp_push+264
> >         tcp_sendmsg_locked+661
> >         tcp_sendmsg+45
> >         inet_sendmsg+67
> >         sock_sendmsg+98
> >         sock_write_iter+147
> >         vfs_write+786
> >         ksys_write+181
> >         __x64_sys_write+25
> >         do_syscall_64+56
> >         entry_SYSCALL_64_after_hwframe+100
> >
> > The problem occurs when multiple tasks share a single qdisc. In such cases,
> > __qdisc_run() may transmit skbs created by different tasks. Consequently,
> > task_get_classid() retrieves an incorrect classid since it references the
> > current task's context rather than the skb's originating task.
> >
> > Given that dev_queue_xmit() always executes with bh disabled, we can safely
> > use in_softirq() instead of in_serving_softirq() to properly identify the
> > softirq context and obtain the correct classid.
> >
>
> nit: you are no longer using in_softirq() in v2, you should update the
> commit message as well.

Oh, my bad.
I will update it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
  2025-08-22  6:42 [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run Yafang Shao
  2025-08-22  7:26 ` Nikolay Aleksandrov
@ 2025-08-28  7:55 ` Paolo Abeni
  2025-08-29  3:23   ` Yafang Shao
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2025-08-28  7:55 UTC (permalink / raw)
  To: Yafang Shao, davem, edumazet, kuba, horms, daniel, bigeasy, tgraf,
	paulmck
  Cc: netdev, bpf

On 8/22/25 8:42 AM, Yafang Shao wrote:
> During recent testing with the netem qdisc to inject delays into TCP
> traffic, we observed that our CLS BPF program failed to function correctly
> due to incorrect classid retrieval from task_get_classid(). The issue
> manifests in the following call stack:
> 
>         bpf_get_cgroup_classid+5
>         cls_bpf_classify+507
>         __tcf_classify+90
>         tcf_classify+217
>         __dev_queue_xmit+798
>         bond_dev_queue_xmit+43
>         __bond_start_xmit+211
>         bond_start_xmit+70
>         dev_hard_start_xmit+142
>         sch_direct_xmit+161
>         __qdisc_run+102             <<<<< Issue location
>         __dev_xmit_skb+1015
>         __dev_queue_xmit+637
>         neigh_hh_output+159
>         ip_finish_output2+461
>         __ip_finish_output+183
>         ip_finish_output+41
>         ip_output+120
>         ip_local_out+94
>         __ip_queue_xmit+394
>         ip_queue_xmit+21
>         __tcp_transmit_skb+2169
>         tcp_write_xmit+959
>         __tcp_push_pending_frames+55
>         tcp_push+264
>         tcp_sendmsg_locked+661
>         tcp_sendmsg+45
>         inet_sendmsg+67
>         sock_sendmsg+98
>         sock_write_iter+147
>         vfs_write+786
>         ksys_write+181
>         __x64_sys_write+25
>         do_syscall_64+56
>         entry_SYSCALL_64_after_hwframe+100
> 
> The problem occurs when multiple tasks share a single qdisc. In such cases,
> __qdisc_run() may transmit skbs created by different tasks. Consequently,
> task_get_classid() retrieves an incorrect classid since it references the
> current task's context rather than the skb's originating task.
> 
> Given that dev_queue_xmit() always executes with bh disabled, we can safely
> use in_softirq() instead of in_serving_softirq() to properly identify the
> softirq context and obtain the correct classid.
> 
> The simple steps to reproduce this issue:
> 1. Add network delay to the network interface:
>   such as: tc qdisc add dev bond0 root netem delay 1.5ms
> 2. Create two distinct net_cls cgroups, each running a network-intensive task
> 3. Initiate parallel TCP streams from both tasks to external servers.
> 
> Under this specific condition, the issue reliably occurs. The kernel
> eventually dequeues an SKB that originated from Task-A while executing in
> the context of Task-B.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> ---
> 
> v1->v2: use softirq_count() instead of in_softirq()
> ---
>  include/net/cls_cgroup.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
> index 7e78e7d6f015..668aeee9b3f6 100644
> --- a/include/net/cls_cgroup.h
> +++ b/include/net/cls_cgroup.h
> @@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb)
>  	 * calls by looking at the number of nested bh disable calls because
>  	 * softirqs always disables bh.
>  	 */
> -	if (in_serving_softirq()) {
> +	if (softirq_count()) {
>  		struct sock *sk = skb_to_full_sk(skb);
>  
>  		/* If there is an sock_cgroup_classid we'll use that. */

AFAICS the above changes the established behavior for a slightly
different scenario:

<sock S is created by task A>
<class ID for task A is changed>
<skb is created by sock S xmit and classified>

prior to this patch the skb will be classified with the 'new' task A
classid, now with the old/original one.

I'm unsure if such behavior change is acceptable; I think at very least
it should be mentioned in the changelog and likely this change should
target net-next.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
  2025-08-28  7:55 ` Paolo Abeni
@ 2025-08-29  3:23   ` Yafang Shao
  2025-08-29  8:14     ` Daniel Borkmann
  0 siblings, 1 reply; 7+ messages in thread
From: Yafang Shao @ 2025-08-29  3:23 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: davem, edumazet, kuba, horms, daniel, bigeasy, tgraf, paulmck,
	netdev, bpf

On Thu, Aug 28, 2025 at 3:55 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 8/22/25 8:42 AM, Yafang Shao wrote:
> > During recent testing with the netem qdisc to inject delays into TCP
> > traffic, we observed that our CLS BPF program failed to function correctly
> > due to incorrect classid retrieval from task_get_classid(). The issue
> > manifests in the following call stack:
> >
> >         bpf_get_cgroup_classid+5
> >         cls_bpf_classify+507
> >         __tcf_classify+90
> >         tcf_classify+217
> >         __dev_queue_xmit+798
> >         bond_dev_queue_xmit+43
> >         __bond_start_xmit+211
> >         bond_start_xmit+70
> >         dev_hard_start_xmit+142
> >         sch_direct_xmit+161
> >         __qdisc_run+102             <<<<< Issue location
> >         __dev_xmit_skb+1015
> >         __dev_queue_xmit+637
> >         neigh_hh_output+159
> >         ip_finish_output2+461
> >         __ip_finish_output+183
> >         ip_finish_output+41
> >         ip_output+120
> >         ip_local_out+94
> >         __ip_queue_xmit+394
> >         ip_queue_xmit+21
> >         __tcp_transmit_skb+2169
> >         tcp_write_xmit+959
> >         __tcp_push_pending_frames+55
> >         tcp_push+264
> >         tcp_sendmsg_locked+661
> >         tcp_sendmsg+45
> >         inet_sendmsg+67
> >         sock_sendmsg+98
> >         sock_write_iter+147
> >         vfs_write+786
> >         ksys_write+181
> >         __x64_sys_write+25
> >         do_syscall_64+56
> >         entry_SYSCALL_64_after_hwframe+100
> >
> > The problem occurs when multiple tasks share a single qdisc. In such cases,
> > __qdisc_run() may transmit skbs created by different tasks. Consequently,
> > task_get_classid() retrieves an incorrect classid since it references the
> > current task's context rather than the skb's originating task.
> >
> > Given that dev_queue_xmit() always executes with bh disabled, we can safely
> > use in_softirq() instead of in_serving_softirq() to properly identify the
> > softirq context and obtain the correct classid.
> >
> > The simple steps to reproduce this issue:
> > 1. Add network delay to the network interface:
> >   such as: tc qdisc add dev bond0 root netem delay 1.5ms
> > 2. Create two distinct net_cls cgroups, each running a network-intensive task
> > 3. Initiate parallel TCP streams from both tasks to external servers.
> >
> > Under this specific condition, the issue reliably occurs. The kernel
> > eventually dequeues an SKB that originated from Task-A while executing in
> > the context of Task-B.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Thomas Graf <tgraf@suug.ch>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >
> > ---
> >
> > v1->v2: use softirq_count() instead of in_softirq()
> > ---
> >  include/net/cls_cgroup.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
> > index 7e78e7d6f015..668aeee9b3f6 100644
> > --- a/include/net/cls_cgroup.h
> > +++ b/include/net/cls_cgroup.h
> > @@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb)
> >        * calls by looking at the number of nested bh disable calls because
> >        * softirqs always disables bh.
> >        */
> > -     if (in_serving_softirq()) {
> > +     if (softirq_count()) {
> >               struct sock *sk = skb_to_full_sk(skb);
> >
> >               /* If there is an sock_cgroup_classid we'll use that. */
>
> AFAICS the above changes the established behavior for a slightly
> different scenario:

right.

>
> <sock S is created by task A>
> <class ID for task A is changed>
> <skb is created by sock S xmit and classified>
>
> prior to this patch the skb will be classified with the 'new' task A
> classid, now with the old/original one.
>
> I'm unsure if such behavior change is acceptable;

The classid of a skb is only meaningful within its original network
context, not from a random task.

> I think at very least
> it should be mentioned in the changelog and likely this change should
> target net-next.

Will add this to the commit log and tag it for net-next in the next version.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
  2025-08-29  3:23   ` Yafang Shao
@ 2025-08-29  8:14     ` Daniel Borkmann
  2025-08-31  3:20       ` Yafang Shao
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2025-08-29  8:14 UTC (permalink / raw)
  To: Yafang Shao, Paolo Abeni
  Cc: davem, edumazet, kuba, horms, bigeasy, tgraf, paulmck, netdev,
	bpf, Martin KaFai Lau

On 8/29/25 5:23 AM, Yafang Shao wrote:
> On Thu, Aug 28, 2025 at 3:55 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 8/22/25 8:42 AM, Yafang Shao wrote:
>>> During recent testing with the netem qdisc to inject delays into TCP
>>> traffic, we observed that our CLS BPF program failed to function correctly
>>> due to incorrect classid retrieval from task_get_classid(). The issue
>>> manifests in the following call stack:
>>>
>>>          bpf_get_cgroup_classid+5
>>>          cls_bpf_classify+507
>>>          __tcf_classify+90
>>>          tcf_classify+217
>>>          __dev_queue_xmit+798
>>>          bond_dev_queue_xmit+43
>>>          __bond_start_xmit+211
>>>          bond_start_xmit+70
>>>          dev_hard_start_xmit+142
>>>          sch_direct_xmit+161
>>>          __qdisc_run+102             <<<<< Issue location
>>>          __dev_xmit_skb+1015
>>>          __dev_queue_xmit+637
>>>          neigh_hh_output+159
>>>          ip_finish_output2+461
>>>          __ip_finish_output+183
>>>          ip_finish_output+41
>>>          ip_output+120
>>>          ip_local_out+94
>>>          __ip_queue_xmit+394
>>>          ip_queue_xmit+21
>>>          __tcp_transmit_skb+2169
>>>          tcp_write_xmit+959
>>>          __tcp_push_pending_frames+55
>>>          tcp_push+264
>>>          tcp_sendmsg_locked+661
>>>          tcp_sendmsg+45
>>>          inet_sendmsg+67
>>>          sock_sendmsg+98
>>>          sock_write_iter+147
>>>          vfs_write+786
>>>          ksys_write+181
>>>          __x64_sys_write+25
>>>          do_syscall_64+56
>>>          entry_SYSCALL_64_after_hwframe+100
>>>
>>> The problem occurs when multiple tasks share a single qdisc. In such cases,
>>> __qdisc_run() may transmit skbs created by different tasks. Consequently,
>>> task_get_classid() retrieves an incorrect classid since it references the
>>> current task's context rather than the skb's originating task.
>>>
>>> Given that dev_queue_xmit() always executes with bh disabled, we can safely
>>> use in_softirq() instead of in_serving_softirq() to properly identify the
>>> softirq context and obtain the correct classid.
>>>
>>> The simple steps to reproduce this issue:
>>> 1. Add network delay to the network interface:
>>>    such as: tc qdisc add dev bond0 root netem delay 1.5ms
>>> 2. Create two distinct net_cls cgroups, each running a network-intensive task
>>> 3. Initiate parallel TCP streams from both tasks to external servers.
>>>
>>> Under this specific condition, the issue reliably occurs. The kernel
>>> eventually dequeues an SKB that originated from Task-A while executing in
>>> the context of Task-B.
>>>
>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>> Cc: Thomas Graf <tgraf@suug.ch>
>>> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>>
>>> v1->v2: use softirq_count() instead of in_softirq()
>>> ---
>>>   include/net/cls_cgroup.h | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
>>> index 7e78e7d6f015..668aeee9b3f6 100644
>>> --- a/include/net/cls_cgroup.h
>>> +++ b/include/net/cls_cgroup.h
>>> @@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb)
>>>         * calls by looking at the number of nested bh disable calls because
>>>         * softirqs always disables bh.
>>>         */
>>> -     if (in_serving_softirq()) {
>>> +     if (softirq_count()) {
>>>                struct sock *sk = skb_to_full_sk(skb);
>>>
>>>                /* If there is an sock_cgroup_classid we'll use that. */
>>
>> AFAICS the above changes the established behavior for a slightly
>> different scenario:
> 
> right.
> 
>> <sock S is created by task A>
>> <class ID for task A is changed>
>> <skb is created by sock S xmit and classified>
>>
>> prior to this patch the skb will be classified with the 'new' task A
>> classid, now with the old/original one.
>>
>> I'm unsure if such behavior change is acceptable;
> 
> The classid of a skb is only meaningful within its original network
> context, not from a random task.

Do you mean by original network context original netns? We also have
bpf_skb_cgroup_classid() as well as bpf_get_cgroup_classid_curr(), both
exposed to tcx, which kind of detangles what task_get_classid() is doing.
I guess if you have apps in its own netns and the skb->sk is retained all
the way to phys dev in hostns then bpf_skb_cgroup_classid() might be a
better choice (assuming classid stays constant from container orchestrator
PoV).

>> I think at very least
>> it should be mentioned in the changelog and likely this change should
>> target net-next.
> 
> Will add this to the commit log and tag it for net-next in the next version.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run
  2025-08-29  8:14     ` Daniel Borkmann
@ 2025-08-31  3:20       ` Yafang Shao
  0 siblings, 0 replies; 7+ messages in thread
From: Yafang Shao @ 2025-08-31  3:20 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Paolo Abeni, davem, edumazet, kuba, horms, bigeasy, tgraf,
	paulmck, netdev, bpf, Martin KaFai Lau

On Fri, Aug 29, 2025 at 4:14 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/29/25 5:23 AM, Yafang Shao wrote:
> > On Thu, Aug 28, 2025 at 3:55 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 8/22/25 8:42 AM, Yafang Shao wrote:
> >>> During recent testing with the netem qdisc to inject delays into TCP
> >>> traffic, we observed that our CLS BPF program failed to function correctly
> >>> due to incorrect classid retrieval from task_get_classid(). The issue
> >>> manifests in the following call stack:
> >>>
> >>>          bpf_get_cgroup_classid+5
> >>>          cls_bpf_classify+507
> >>>          __tcf_classify+90
> >>>          tcf_classify+217
> >>>          __dev_queue_xmit+798
> >>>          bond_dev_queue_xmit+43
> >>>          __bond_start_xmit+211
> >>>          bond_start_xmit+70
> >>>          dev_hard_start_xmit+142
> >>>          sch_direct_xmit+161
> >>>          __qdisc_run+102             <<<<< Issue location
> >>>          __dev_xmit_skb+1015
> >>>          __dev_queue_xmit+637
> >>>          neigh_hh_output+159
> >>>          ip_finish_output2+461
> >>>          __ip_finish_output+183
> >>>          ip_finish_output+41
> >>>          ip_output+120
> >>>          ip_local_out+94
> >>>          __ip_queue_xmit+394
> >>>          ip_queue_xmit+21
> >>>          __tcp_transmit_skb+2169
> >>>          tcp_write_xmit+959
> >>>          __tcp_push_pending_frames+55
> >>>          tcp_push+264
> >>>          tcp_sendmsg_locked+661
> >>>          tcp_sendmsg+45
> >>>          inet_sendmsg+67
> >>>          sock_sendmsg+98
> >>>          sock_write_iter+147
> >>>          vfs_write+786
> >>>          ksys_write+181
> >>>          __x64_sys_write+25
> >>>          do_syscall_64+56
> >>>          entry_SYSCALL_64_after_hwframe+100
> >>>
> >>> The problem occurs when multiple tasks share a single qdisc. In such cases,
> >>> __qdisc_run() may transmit skbs created by different tasks. Consequently,
> >>> task_get_classid() retrieves an incorrect classid since it references the
> >>> current task's context rather than the skb's originating task.
> >>>
> >>> Given that dev_queue_xmit() always executes with bh disabled, we can safely
> >>> use in_softirq() instead of in_serving_softirq() to properly identify the
> >>> softirq context and obtain the correct classid.
> >>>
> >>> The simple steps to reproduce this issue:
> >>> 1. Add network delay to the network interface:
> >>>    such as: tc qdisc add dev bond0 root netem delay 1.5ms
> >>> 2. Create two distinct net_cls cgroups, each running a network-intensive task
> >>> 3. Initiate parallel TCP streams from both tasks to external servers.
> >>>
> >>> Under this specific condition, the issue reliably occurs. The kernel
> >>> eventually dequeues an SKB that originated from Task-A while executing in
> >>> the context of Task-B.
> >>>
> >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >>> Cc: Daniel Borkmann <daniel@iogearbox.net>
> >>> Cc: Thomas Graf <tgraf@suug.ch>
> >>> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >>>
> >>> v1->v2: use softirq_count() instead of in_softirq()
> >>> ---
> >>>   include/net/cls_cgroup.h | 2 +-
> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
> >>> index 7e78e7d6f015..668aeee9b3f6 100644
> >>> --- a/include/net/cls_cgroup.h
> >>> +++ b/include/net/cls_cgroup.h
> >>> @@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb)
> >>>         * calls by looking at the number of nested bh disable calls because
> >>>         * softirqs always disables bh.
> >>>         */
> >>> -     if (in_serving_softirq()) {
> >>> +     if (softirq_count()) {
> >>>                struct sock *sk = skb_to_full_sk(skb);
> >>>
> >>>                /* If there is an sock_cgroup_classid we'll use that. */
> >>
> >> AFAICS the above changes the established behavior for a slightly
> >> different scenario:
> >
> > right.
> >
> >> <sock S is created by task A>
> >> <class ID for task A is changed>
> >> <skb is created by sock S xmit and classified>
> >>
> >> prior to this patch the skb will be classified with the 'new' task A
> >> classid, now with the old/original one.
> >>
> >> I'm unsure if such behavior change is acceptable;
> >
> > The classid of a skb is only meaningful within its original network
> > context, not from a random task.
>
> Do you mean by original network context original netns? We also have
> bpf_skb_cgroup_classid() as well as bpf_get_cgroup_classid_curr(), both
> exposed to tcx, which kind of detangles what task_get_classid() is doing.
> I guess if you have apps in its own netns and the skb->sk is retained all
> the way to phys dev in hostns then bpf_skb_cgroup_classid() might be a
> better choice (assuming classid stays constant from container orchestrator
> PoV).

Right. We have replaced bpf_get_cgroup_classid() with
bpf_skb_cgroup_classid() to handle this case. Nonetheless, I believe
we still need to fix bpf_get_cgroup_classid(), since this function can
easily mislead users.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-31  3:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22  6:42 [PATCH v2] net/cls_cgroup: Fix task_get_classid() during qdisc run Yafang Shao
2025-08-22  7:26 ` Nikolay Aleksandrov
2025-08-22  7:34   ` Yafang Shao
2025-08-28  7:55 ` Paolo Abeni
2025-08-29  3:23   ` Yafang Shao
2025-08-29  8:14     ` Daniel Borkmann
2025-08-31  3:20       ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).