netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 6.10/bisected/regression - commits 4e7aaa6b82d6 cause appearing WARNING at net/netfilter/ipset/ip_set_core.c:1200 suspicious rcu_dereference_protected() usage
@ 2024-06-20  6:55 Mikhail Gavrilov
  2024-06-20  8:42 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 2+ messages in thread
From: Mikhail Gavrilov @ 2024-06-20  6:55 UTC (permalink / raw)
  To: kadlec, nnamrec, Pablo Neira Ayuso, Linux List Kernel Mailing,
	Linux regressions mailing list, netdev

[-- Attachment #1: Type: text/plain, Size: 5557 bytes --]

Hi,
between 2ef5971ff345 and rc4 I spotted a new regression.
It is expressed in the appearance warning with stacktrace after one
minute after boot.

 =============================
 WARNING: suspicious RCU usage
 6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug #1 Tainted:
G        W    L    -------  ---
 -----------------------------
 net/netfilter/ipset/ip_set_core.c:1200 suspicious
rcu_dereference_protected() usage!

 other info that might help us debug this:


 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by kworker/u128:1/264:
  #0: ffff88810813c958 ((wq_completion)netns){+.+.}-{0:0}, at:
process_one_work+0xeab/0x1460
  #1: ffffc90001477da0 (net_cleanup_work){+.+.}-{0:0}, at:
process_one_work+0x82b/0x1460
  #2: ffffffff97d9ae98 (pernet_ops_rwsem){++++}-{3:3}, at:
cleanup_net+0xb9/0xa90

 stack backtrace:
 CPU: 30 PID: 264 Comm: kworker/u128:1 Tainted: G        W    L
-------  ---  6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug
#1
 Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING
WIFI, BIOS 2611 04/07/2024
 Workqueue: netns cleanup_net
 Call Trace:
  <TASK>
  dump_stack_lvl+0x84/0xd0
  lockdep_rcu_suspicious.cold+0xa1/0x134
  _destroy_all_sets+0x1c7/0x560 [ip_set]
  ip_set_net_exit+0x20/0x50 [ip_set]
  ops_exit_list+0x99/0x170
  cleanup_net+0x4d9/0xa90
  ? __pfx_cleanup_net+0x10/0x10
  process_one_work+0x8a4/0x1460
  ? worker_thread+0xe3/0x1010
  ? __pfx_process_one_work+0x10/0x10
  ? assign_work+0x16c/0x240
  worker_thread+0x5e6/0x1010
  ? __kthread_parkme+0xb1/0x1d0
  ? __pfx_worker_thread+0x10/0x10
  ? __pfx_worker_thread+0x10/0x10
  kthread+0x2d2/0x3a0
  ? _raw_spin_unlock_irq+0x28/0x60
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

 =============================
 WARNING: suspicious RCU usage
 6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug #1 Tainted:
G        W    L    -------  ---
 -----------------------------
 net/netfilter/ipset/ip_set_core.c:1211 suspicious
rcu_dereference_protected() usage!

 other info that might help us debug this:


 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by kworker/u128:1/264:
  #0: ffff88810813c958 ((wq_completion)netns){+.+.}-{0:0}, at:
process_one_work+0xeab/0x1460
  #1: ffffc90001477da0 (net_cleanup_work){+.+.}-{0:0}, at:
process_one_work+0x82b/0x1460
  #2: ffffffff97d9ae98 (pernet_ops_rwsem){++++}-{3:3}, at:
cleanup_net+0xb9/0xa90

 stack backtrace:
 CPU: 30 PID: 264 Comm: kworker/u128:1 Tainted: G        W    L
-------  ---  6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug
#1
 Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING
WIFI, BIOS 2611 04/07/2024
 Workqueue: netns cleanup_net
 Call Trace:
  <TASK>
  dump_stack_lvl+0x84/0xd0
  lockdep_rcu_suspicious.cold+0xa1/0x134
  _destroy_all_sets+0x3a8/0x560 [ip_set]
  ip_set_net_exit+0x20/0x50 [ip_set]
  ops_exit_list+0x99/0x170
  cleanup_net+0x4d9/0xa90
  ? __pfx_cleanup_net+0x10/0x10
  process_one_work+0x8a4/0x1460
  ? worker_thread+0xe3/0x1010
  ? __pfx_process_one_work+0x10/0x10
  ? assign_work+0x16c/0x240
  worker_thread+0x5e6/0x1010
  ? __kthread_parkme+0xb1/0x1d0
  ? __pfx_worker_thread+0x10/0x10
  ? __pfx_worker_thread+0x10/0x10
  kthread+0x2d2/0x3a0
  ? _raw_spin_unlock_irq+0x28/0x60
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 7 times,
consider switching to WQ_UNBOUND
 workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 11 times,
consider switching to WQ_UNBOUND

Bisect blame this commit:
commit 4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10
Author: Jozsef Kadlecsik <kadlec@netfilter.org>
Date:   Tue Jun 4 15:58:03 2024 +0200

    netfilter: ipset: Fix race between namespace cleanup and gc in the
list:set type

    Lion Ackermann reported that there is a race condition between
namespace cleanup
    in ipset and the garbage collection of the list:set type. The namespace
    cleanup can destroy the list:set type of sets while the gc of the
set type is
    waiting to run in rcu cleanup. The latter uses data from the
destroyed set which
    thus leads use after free. The patch contains the following parts:

    - When destroying all sets, first remove the garbage collectors, then wait
      if needed and then destroy the sets.
    - Fix the badly ordered "wait then remove gc" for the destroy a single set
      case.
    - Fix the missing rcu locking in the list:set type in the userspace test
      case.
    - Use proper RCU list handlings in the list:set type.

    The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list
flush to cancel_gc).

    Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression
in swap operation)
    Reported-by: Lion Ackermann <nnamrec@gmail.com>
    Tested-by: Lion Ackermann <nnamrec@gmail.com>
    Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

 net/netfilter/ipset/ip_set_core.c     | 81
++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
 net/netfilter/ipset/ip_set_list_set.c | 30 ++++++++++++++----------------
 2 files changed, 60 insertions(+), 51 deletions(-)

And I can confirm after reverting 4e7aaa6b82d6 the issue is gone.

I also attach the build config and full kernel log.

My hardware specs: https://linux-hardware.org/?probe=80512f0c04

Jozsef can you look into this please?

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: dmesg-6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug.zip --]
[-- Type: application/zip, Size: 52668 bytes --]

[-- Attachment #3: .config.zip --]
[-- Type: application/zip, Size: 66526 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: 6.10/bisected/regression - commits 4e7aaa6b82d6 cause appearing WARNING at net/netfilter/ipset/ip_set_core.c:1200 suspicious rcu_dereference_protected() usage
  2024-06-20  6:55 6.10/bisected/regression - commits 4e7aaa6b82d6 cause appearing WARNING at net/netfilter/ipset/ip_set_core.c:1200 suspicious rcu_dereference_protected() usage Mikhail Gavrilov
@ 2024-06-20  8:42 ` Pablo Neira Ayuso
  0 siblings, 0 replies; 2+ messages in thread
From: Pablo Neira Ayuso @ 2024-06-20  8:42 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: kadlec, nnamrec, Linux List Kernel Mailing,
	Linux regressions mailing list, netdev

Hi,

A fix has been included in this batch:

https://lore.kernel.org/netfilter-devel/20240619170537.2846-1-pablo@netfilter.org/T/#m39095851d6fc9d1a04d9de66f5580f61c9593389

On Thu, Jun 20, 2024 at 11:55:05AM +0500, Mikhail Gavrilov wrote:
> Hi,
> between 2ef5971ff345 and rc4 I spotted a new regression.
> It is expressed in the appearance warning with stacktrace after one
> minute after boot.
> 
>  =============================
>  WARNING: suspicious RCU usage
>  6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug #1 Tainted:
> G        W    L    -------  ---
>  -----------------------------
>  net/netfilter/ipset/ip_set_core.c:1200 suspicious
> rcu_dereference_protected() usage!
> 
>  other info that might help us debug this:
> 
> 
>  rcu_scheduler_active = 2, debug_locks = 1
>  3 locks held by kworker/u128:1/264:
>   #0: ffff88810813c958 ((wq_completion)netns){+.+.}-{0:0}, at:
> process_one_work+0xeab/0x1460
>   #1: ffffc90001477da0 (net_cleanup_work){+.+.}-{0:0}, at:
> process_one_work+0x82b/0x1460
>   #2: ffffffff97d9ae98 (pernet_ops_rwsem){++++}-{3:3}, at:
> cleanup_net+0xb9/0xa90
> 
>  stack backtrace:
>  CPU: 30 PID: 264 Comm: kworker/u128:1 Tainted: G        W    L
> -------  ---  6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug
> #1
>  Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING
> WIFI, BIOS 2611 04/07/2024
>  Workqueue: netns cleanup_net
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x84/0xd0
>   lockdep_rcu_suspicious.cold+0xa1/0x134
>   _destroy_all_sets+0x1c7/0x560 [ip_set]
>   ip_set_net_exit+0x20/0x50 [ip_set]
>   ops_exit_list+0x99/0x170
>   cleanup_net+0x4d9/0xa90
>   ? __pfx_cleanup_net+0x10/0x10
>   process_one_work+0x8a4/0x1460
>   ? worker_thread+0xe3/0x1010
>   ? __pfx_process_one_work+0x10/0x10
>   ? assign_work+0x16c/0x240
>   worker_thread+0x5e6/0x1010
>   ? __kthread_parkme+0xb1/0x1d0
>   ? __pfx_worker_thread+0x10/0x10
>   ? __pfx_worker_thread+0x10/0x10
>   kthread+0x2d2/0x3a0
>   ? _raw_spin_unlock_irq+0x28/0x60
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x31/0x70
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
> 
>  =============================
>  WARNING: suspicious RCU usage
>  6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug #1 Tainted:
> G        W    L    -------  ---
>  -----------------------------
>  net/netfilter/ipset/ip_set_core.c:1211 suspicious
> rcu_dereference_protected() usage!
> 
>  other info that might help us debug this:
> 
> 
>  rcu_scheduler_active = 2, debug_locks = 1
>  3 locks held by kworker/u128:1/264:
>   #0: ffff88810813c958 ((wq_completion)netns){+.+.}-{0:0}, at:
> process_one_work+0xeab/0x1460
>   #1: ffffc90001477da0 (net_cleanup_work){+.+.}-{0:0}, at:
> process_one_work+0x82b/0x1460
>   #2: ffffffff97d9ae98 (pernet_ops_rwsem){++++}-{3:3}, at:
> cleanup_net+0xb9/0xa90
> 
>  stack backtrace:
>  CPU: 30 PID: 264 Comm: kworker/u128:1 Tainted: G        W    L
> -------  ---  6.10.0-0.rc4.20240618git14d7c92f8df9.40.fc41.x86_64+debug
> #1
>  Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING
> WIFI, BIOS 2611 04/07/2024
>  Workqueue: netns cleanup_net
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x84/0xd0
>   lockdep_rcu_suspicious.cold+0xa1/0x134
>   _destroy_all_sets+0x3a8/0x560 [ip_set]
>   ip_set_net_exit+0x20/0x50 [ip_set]
>   ops_exit_list+0x99/0x170
>   cleanup_net+0x4d9/0xa90
>   ? __pfx_cleanup_net+0x10/0x10
>   process_one_work+0x8a4/0x1460
>   ? worker_thread+0xe3/0x1010
>   ? __pfx_process_one_work+0x10/0x10
>   ? assign_work+0x16c/0x240
>   worker_thread+0x5e6/0x1010
>   ? __kthread_parkme+0xb1/0x1d0
>   ? __pfx_worker_thread+0x10/0x10
>   ? __pfx_worker_thread+0x10/0x10
>   kthread+0x2d2/0x3a0
>   ? _raw_spin_unlock_irq+0x28/0x60
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x31/0x70
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
>  workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 7 times,
> consider switching to WQ_UNBOUND
>  workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 11 times,
> consider switching to WQ_UNBOUND
> 
> Bisect blame this commit:
> commit 4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10
> Author: Jozsef Kadlecsik <kadlec@netfilter.org>
> Date:   Tue Jun 4 15:58:03 2024 +0200
> 
>     netfilter: ipset: Fix race between namespace cleanup and gc in the
> list:set type
> 
>     Lion Ackermann reported that there is a race condition between
> namespace cleanup
>     in ipset and the garbage collection of the list:set type. The namespace
>     cleanup can destroy the list:set type of sets while the gc of the
> set type is
>     waiting to run in rcu cleanup. The latter uses data from the
> destroyed set which
>     thus leads use after free. The patch contains the following parts:
> 
>     - When destroying all sets, first remove the garbage collectors, then wait
>       if needed and then destroy the sets.
>     - Fix the badly ordered "wait then remove gc" for the destroy a single set
>       case.
>     - Fix the missing rcu locking in the list:set type in the userspace test
>       case.
>     - Use proper RCU list handlings in the list:set type.
> 
>     The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list
> flush to cancel_gc).
> 
>     Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression
> in swap operation)
>     Reported-by: Lion Ackermann <nnamrec@gmail.com>
>     Tested-by: Lion Ackermann <nnamrec@gmail.com>
>     Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
>     Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> 
>  net/netfilter/ipset/ip_set_core.c     | 81
> ++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
>  net/netfilter/ipset/ip_set_list_set.c | 30 ++++++++++++++----------------
>  2 files changed, 60 insertions(+), 51 deletions(-)
> 
> And I can confirm after reverting 4e7aaa6b82d6 the issue is gone.
> 
> I also attach the build config and full kernel log.
> 
> My hardware specs: https://linux-hardware.org/?probe=80512f0c04
> 
> Jozsef can you look into this please?
> 
> -- 
> Best Regards,
> Mike Gavrilov.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-06-20  8:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20  6:55 6.10/bisected/regression - commits 4e7aaa6b82d6 cause appearing WARNING at net/netfilter/ipset/ip_set_core.c:1200 suspicious rcu_dereference_protected() usage Mikhail Gavrilov
2024-06-20  8:42 ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).