netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
@ 2025-12-10  5:31 Dharanitharan R
  2025-12-10 12:51 ` Simon Horman
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dharanitharan R @ 2025-12-10  5:31 UTC (permalink / raw)
  To: syzbot+422806e5f4cce722a71f; +Cc: netdev, linux-kernel, dharanitharan725

In __team_queue_override_port_del(), repeated deletion of the same port
using list_del_rcu() could corrupt the RCU-protected qom_list. This
happens if the function is called multiple times on the same port, for
example during port removal or team reconfiguration.

This patch replaces list_del_rcu() with list_del_init_rcu() to:

  - Ensure safe repeated deletion of the same port
  - Keep the RCU list consistent
  - Avoid potential use-after-free and list corruption issues

Testing:
  - Syzbot-reported crash is eliminated in testing.
  - Kernel builds and runs cleanly

Fixes: 108f9405ce81 ("team: add queue override configuration mechanism")
Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f
Signed-off-by: Dharanitharan R <dharanitharan725@gmail.com>
---
 drivers/net/team/team_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
index 4d5c9ae8f221..d6d724b52dbf 100644
--- a/drivers/net/team/team_core.c
+++ b/drivers/net/team/team_core.c
@@ -823,7 +823,8 @@ static void __team_queue_override_port_del(struct team *team,
 {
 	if (!port->queue_id)
 		return;
-	list_del_rcu(&port->qom_list);
+	/* Ensure safe repeated deletion */
+	list_del_init_rcu(&port->qom_list);
 }
 
 static bool team_queue_override_port_has_gt_prio_than(struct team_port *port,
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
  2025-12-10  5:31 [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu() Dharanitharan R
@ 2025-12-10 12:51 ` Simon Horman
  2025-12-11  9:38   ` Jiri Pirko
  2025-12-12 10:11 ` Jiri Pirko
  2025-12-16  5:20 ` kernel test robot
  2 siblings, 1 reply; 7+ messages in thread
From: Simon Horman @ 2025-12-10 12:51 UTC (permalink / raw)
  To: Dharanitharan R; +Cc: syzbot+422806e5f4cce722a71f, netdev, linux-kernel

On Wed, Dec 10, 2025 at 05:31:05AM +0000, Dharanitharan R wrote:
> In __team_queue_override_port_del(), repeated deletion of the same port
> using list_del_rcu() could corrupt the RCU-protected qom_list. This
> happens if the function is called multiple times on the same port, for
> example during port removal or team reconfiguration.
> 
> This patch replaces list_del_rcu() with list_del_init_rcu() to:
> 
>   - Ensure safe repeated deletion of the same port
>   - Keep the RCU list consistent
>   - Avoid potential use-after-free and list corruption issues
> 
> Testing:
>   - Syzbot-reported crash is eliminated in testing.
>   - Kernel builds and runs cleanly
> 
> Fixes: 108f9405ce81 ("team: add queue override configuration mechanism")
> Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f
> Signed-off-by: Dharanitharan R <dharanitharan725@gmail.com>

Thanks for addressing my review of v1.
The commit message looks much better to me.

However, I am unable to find the cited commit in net.

And I am still curious about the cause: are you sure it is repeated deletion?

> ---
>  drivers/net/team/team_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> index 4d5c9ae8f221..d6d724b52dbf 100644
> --- a/drivers/net/team/team_core.c
> +++ b/drivers/net/team/team_core.c
> @@ -823,7 +823,8 @@ static void __team_queue_override_port_del(struct team *team,
>  {
>  	if (!port->queue_id)
>  		return;
> -	list_del_rcu(&port->qom_list);
> +	/* Ensure safe repeated deletion */
> +	list_del_init_rcu(&port->qom_list);

When applied against net this does not compile
as list_del_init_rcu (as opposed to hlist_del_init_rcu) does
not seem to exist in that tree. Am I missing something?

>  }
>  
>  static bool team_queue_override_port_has_gt_prio_than(struct team_port *port,
> -- 
> 2.43.0

-- 
pw-bot: changes-requested

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
  2025-12-10 12:51 ` Simon Horman
@ 2025-12-11  9:38   ` Jiri Pirko
  2025-12-11  9:38     ` syzbot
  2025-12-11 16:14     ` Simon Horman
  0 siblings, 2 replies; 7+ messages in thread
From: Jiri Pirko @ 2025-12-11  9:38 UTC (permalink / raw)
  To: Simon Horman
  Cc: Dharanitharan R, syzbot+422806e5f4cce722a71f, netdev,
	linux-kernel

Wed, Dec 10, 2025 at 01:51:39PM +0100, horms@kernel.org wrote:
>On Wed, Dec 10, 2025 at 05:31:05AM +0000, Dharanitharan R wrote:
>> In __team_queue_override_port_del(), repeated deletion of the same port
>> using list_del_rcu() could corrupt the RCU-protected qom_list. This
>> happens if the function is called multiple times on the same port, for
>> example during port removal or team reconfiguration.
>> 
>> This patch replaces list_del_rcu() with list_del_init_rcu() to:
>> 
>>   - Ensure safe repeated deletion of the same port
>>   - Keep the RCU list consistent
>>   - Avoid potential use-after-free and list corruption issues
>> 
>> Testing:
>>   - Syzbot-reported crash is eliminated in testing.
>>   - Kernel builds and runs cleanly
>> 
>> Fixes: 108f9405ce81 ("team: add queue override configuration mechanism")
>> Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f
>> Signed-off-by: Dharanitharan R <dharanitharan725@gmail.com>
>
>Thanks for addressing my review of v1.
>The commit message looks much better to me.
>
>However, I am unable to find the cited commit in net.
>
>And I am still curious about the cause: are you sure it is repeated deletion?

It looks like it is. But I believe we need to fix the root cause, why
the list_del is called twice and don't blindly take AI made fix with AI
made patch description :O

I actually think that following path might the be problematic one:
1) Port is enabled, queue_id != 0, in qom_list
2) Port gets disabled
	-> team_port_disable()
        -> team_queue_override_port_del()
        -> del (removed from list)
3) Port is disabled, queue_id != 0, not in any list
4) Priority changes
        -> team_queue_override_port_prio_changed()
	-> checks: port disabled && queue_id != 0
        -> calls del - hits the BUG as it is removed already

Will test the fix and submit shortly.

#syz test

diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
index 4d5c9ae8f221..c08a5c1bd6e4 100644
--- a/drivers/net/team/team_core.c
+++ b/drivers/net/team/team_core.c
@@ -878,7 +878,7 @@ static void __team_queue_override_enabled_check(struct team *team)
 static void team_queue_override_port_prio_changed(struct team *team,
 						  struct team_port *port)
 {
-	if (!port->queue_id || team_port_enabled(port))
+	if (!port->queue_id || !team_port_enabled(port))
 		return;
 	__team_queue_override_port_del(team, port);
 	__team_queue_override_port_add(team, port);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
  2025-12-11  9:38   ` Jiri Pirko
@ 2025-12-11  9:38     ` syzbot
  2025-12-11 16:14     ` Simon Horman
  1 sibling, 0 replies; 7+ messages in thread
From: syzbot @ 2025-12-11  9:38 UTC (permalink / raw)
  To: jiri; +Cc: dharanitharan725, horms, jiri, linux-kernel, netdev,
	syzkaller-bugs

> Wed, Dec 10, 2025 at 01:51:39PM +0100, horms@kernel.org wrote:
>>On Wed, Dec 10, 2025 at 05:31:05AM +0000, Dharanitharan R wrote:
>>> In __team_queue_override_port_del(), repeated deletion of the same port
>>> using list_del_rcu() could corrupt the RCU-protected qom_list. This
>>> happens if the function is called multiple times on the same port, for
>>> example during port removal or team reconfiguration.
>>> 
>>> This patch replaces list_del_rcu() with list_del_init_rcu() to:
>>> 
>>>   - Ensure safe repeated deletion of the same port
>>>   - Keep the RCU list consistent
>>>   - Avoid potential use-after-free and list corruption issues
>>> 
>>> Testing:
>>>   - Syzbot-reported crash is eliminated in testing.
>>>   - Kernel builds and runs cleanly
>>> 
>>> Fixes: 108f9405ce81 ("team: add queue override configuration mechanism")
>>> Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f
>>> Signed-off-by: Dharanitharan R <dharanitharan725@gmail.com>
>>
>>Thanks for addressing my review of v1.
>>The commit message looks much better to me.
>>
>>However, I am unable to find the cited commit in net.
>>
>>And I am still curious about the cause: are you sure it is repeated deletion?
>
> It looks like it is. But I believe we need to fix the root cause, why
> the list_del is called twice and don't blindly take AI made fix with AI
> made patch description :O
>
> I actually think that following path might the be problematic one:
> 1) Port is enabled, queue_id != 0, in qom_list
> 2) Port gets disabled
> 	-> team_port_disable()
>         -> team_queue_override_port_del()
>         -> del (removed from list)
> 3) Port is disabled, queue_id != 0, not in any list
> 4) Priority changes
>         -> team_queue_override_port_prio_changed()
> 	-> checks: port disabled && queue_id != 0
>         -> calls del - hits the BUG as it is removed already
>
> Will test the fix and submit shortly.
>
> #syz test

This crash does not have a reproducer. I cannot test it.

>
> diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> index 4d5c9ae8f221..c08a5c1bd6e4 100644
> --- a/drivers/net/team/team_core.c
> +++ b/drivers/net/team/team_core.c
> @@ -878,7 +878,7 @@ static void __team_queue_override_enabled_check(struct team *team)
>  static void team_queue_override_port_prio_changed(struct team *team,
>  						  struct team_port *port)
>  {
> -	if (!port->queue_id || team_port_enabled(port))
> +	if (!port->queue_id || !team_port_enabled(port))
>  		return;
>  	__team_queue_override_port_del(team, port);
>  	__team_queue_override_port_add(team, port);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
  2025-12-11  9:38   ` Jiri Pirko
  2025-12-11  9:38     ` syzbot
@ 2025-12-11 16:14     ` Simon Horman
  1 sibling, 0 replies; 7+ messages in thread
From: Simon Horman @ 2025-12-11 16:14 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Dharanitharan R, syzbot+422806e5f4cce722a71f, netdev,
	linux-kernel

On Thu, Dec 11, 2025 at 10:38:43AM +0100, Jiri Pirko wrote:
> Wed, Dec 10, 2025 at 01:51:39PM +0100, horms@kernel.org wrote:
> >On Wed, Dec 10, 2025 at 05:31:05AM +0000, Dharanitharan R wrote:
> >> In __team_queue_override_port_del(), repeated deletion of the same port
> >> using list_del_rcu() could corrupt the RCU-protected qom_list. This
> >> happens if the function is called multiple times on the same port, for
> >> example during port removal or team reconfiguration.
> >> 
> >> This patch replaces list_del_rcu() with list_del_init_rcu() to:
> >> 
> >>   - Ensure safe repeated deletion of the same port
> >>   - Keep the RCU list consistent
> >>   - Avoid potential use-after-free and list corruption issues
> >> 
> >> Testing:
> >>   - Syzbot-reported crash is eliminated in testing.
> >>   - Kernel builds and runs cleanly
> >> 
> >> Fixes: 108f9405ce81 ("team: add queue override configuration mechanism")
> >> Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f
> >> Signed-off-by: Dharanitharan R <dharanitharan725@gmail.com>
> >
> >Thanks for addressing my review of v1.
> >The commit message looks much better to me.
> >
> >However, I am unable to find the cited commit in net.
> >
> >And I am still curious about the cause: are you sure it is repeated deletion?
> 
> It looks like it is. But I believe we need to fix the root cause, why
> the list_del is called twice and don't blindly take AI made fix with AI
> made patch description :O
> 
> I actually think that following path might the be problematic one:
> 1) Port is enabled, queue_id != 0, in qom_list
> 2) Port gets disabled
> 	-> team_port_disable()
>         -> team_queue_override_port_del()
>         -> del (removed from list)
> 3) Port is disabled, queue_id != 0, not in any list
> 4) Priority changes
>         -> team_queue_override_port_prio_changed()
> 	-> checks: port disabled && queue_id != 0
>         -> calls del - hits the BUG as it is removed already
> 
> Will test the fix and submit shortly.

Thanks, much appreciated.

...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
  2025-12-10  5:31 [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu() Dharanitharan R
  2025-12-10 12:51 ` Simon Horman
@ 2025-12-12 10:11 ` Jiri Pirko
  2025-12-16  5:20 ` kernel test robot
  2 siblings, 0 replies; 7+ messages in thread
From: Jiri Pirko @ 2025-12-12 10:11 UTC (permalink / raw)
  To: Dharanitharan R; +Cc: syzbot+422806e5f4cce722a71f, netdev, linux-kernel

Wed, Dec 10, 2025 at 06:31:05AM +0100, dharanitharan725@gmail.com wrote:
>In __team_queue_override_port_del(), repeated deletion of the same port
>using list_del_rcu() could corrupt the RCU-protected qom_list. This
>happens if the function is called multiple times on the same port, for
>example during port removal or team reconfiguration.
>
>This patch replaces list_del_rcu() with list_del_init_rcu() to:
>
>  - Ensure safe repeated deletion of the same port
>  - Keep the RCU list consistent
>  - Avoid potential use-after-free and list corruption issues
>
>Testing:
>  - Syzbot-reported crash is eliminated in testing.
>  - Kernel builds and runs cleanly
>
>Fixes: 108f9405ce81 ("team: add queue override configuration mechanism")

Awesome, this commit is AI hallucinated. Can you do some basic checking
before you send this ****?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
  2025-12-10  5:31 [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu() Dharanitharan R
  2025-12-10 12:51 ` Simon Horman
  2025-12-12 10:11 ` Jiri Pirko
@ 2025-12-16  5:20 ` kernel test robot
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-12-16  5:20 UTC (permalink / raw)
  To: Dharanitharan R, syzbot+422806e5f4cce722a71f
  Cc: oe-kbuild-all, netdev, linux-kernel, dharanitharan725

Hi Dharanitharan,

kernel test robot noticed the following build errors:

[auto build test ERROR on net/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Dharanitharan-R/team-fix-qom_list-corruption-by-using-list_del_init_rcu/20251210-133429
base:   net/main
patch link:    https://lore.kernel.org/r/20251210053104.23608-2-dharanitharan725%40gmail.com
patch subject: [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu()
config: x86_64-rhel-9.4-ltp (https://download.01.org/0day-ci/archive/20251216/202512160610.CtwITAzk-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251216/202512160610.CtwITAzk-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512160610.CtwITAzk-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/net/team/team_core.c: In function '__team_queue_override_port_del':
>> drivers/net/team/team_core.c:827:9: error: implicit declaration of function 'list_del_init_rcu'; did you mean 'hlist_del_init_rcu'? [-Wimplicit-function-declaration]
     827 |         list_del_init_rcu(&port->qom_list);
         |         ^~~~~~~~~~~~~~~~~
         |         hlist_del_init_rcu


vim +827 drivers/net/team/team_core.c

   820	
   821	static void __team_queue_override_port_del(struct team *team,
   822						   struct team_port *port)
   823	{
   824		if (!port->queue_id)
   825			return;
   826		/* Ensure safe repeated deletion */
 > 827		list_del_init_rcu(&port->qom_list);
   828	}
   829	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-12-16  5:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-10  5:31 [PATCH net v2] team: fix qom_list corruption by using list_del_init_rcu() Dharanitharan R
2025-12-10 12:51 ` Simon Horman
2025-12-11  9:38   ` Jiri Pirko
2025-12-11  9:38     ` syzbot
2025-12-11 16:14     ` Simon Horman
2025-12-12 10:11 ` Jiri Pirko
2025-12-16  5:20 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).