Question about perf sibling

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* Question about perf sibling_list race problem
@ 2023-11-29  9:33 Zhengyuan Liu
  2023-11-29 10:32 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Zhengyuan Liu @ 2023-11-29  9:33 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-perf-users, stable, 胡海, 刘云,
	huangjinhui, Zhengyuan Liu, acme

Hi, all

We are encountering a perf related soft lockup as shown below:

[25023823.265138] watchdog: BUG: soft lockup - CPU#29 stuck for 45s!
[YD:3284696]
[25023823.275772]  net_failover virtio_scsi failover
[25023823.276750] CPU: 29 PID: 3284696 Comm: YD Kdump: loaded Not
tainted 4.19.90-23.18.v2101.ky10.aarch64 #1
[25023823.278257] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[25023823.279475] pstate: 80400005 (Nzcv daif +PAN -UAO)
[25023823.280516] pc : perf_iterate_sb+0x1b8/0x1f0
[25023823.281530] lr : perf_iterate_sb+0x18c/0x1f0
[25023823.282529] sp : ffff801f282efbf0
[25023823.283446] x29: ffff801f282efbf0 x28: ffff801f207a8b80
[25023823.284551] x27: 0000000000000000 x26: ffff801f99b355e8
[25023823.285674] x25: 0000000000000000 x24: ffff8019e2fbd800
[25023823.286770] x23: ffff0000093f0018 x22: ffff801f282efc40
[25023823.287864] x21: ffff000008255f60 x20: ffff801ffdf58e80
[25023823.288964] x19: ffff8019f1c27800 x18: 0000000000000000
[25023823.290060] x17: 0000000000000000 x16: 0000000000000000
[25023823.291164] x15: 0400000000000000 x14: 0000000000000000
[25023823.292266] x13: ffff000008c6e340 x12: 0000000000000002
[25023823.293381] x11: ffff000008c6e318 x10: 00000019e5feff20
[25023823.294486] x9 : ffff8019fb49c000 x8 : 0058e6fd335b260e
[25023823.295597] x7 : 0000000100321ed8 x6 : ffff00003d083780
[25023823.296715] x5 : 00ffffffffffffff x4 : 0000801ff4ae0000
[25023823.297860] x3 : ffff801ffdf64cc0 x2 : ffff000009858758
[25023823.298977] x1 : 0000000000000000 x0 : ffff8019e2fbd800
[25023823.300090] Call trace:
[25023823.300962]  perf_iterate_sb+0x1b8/0x1f0
[25023823.301961]  perf_event_task+0x78/0x80
[25023823.302946]  perf_event_exit_task+0xa4/0xb0
[25023823.303978]  do_exit+0x38c/0x5d0
[25023823.304932]  do_group_exit+0x3c/0xd8
[25023823.305904]  get_signal+0x12c/0x740
[25023823.306859]  do_signal+0x158/0x260
[25023823.307795]  do_notify_resume+0xd8/0x358
[25023823.308781]  work_pending+0x8/0x10

We got a vmcore by enable panic_on_soft_lockup, from the vmcore we
found the perf_event accessed through
perf_iterate_sb -> perf_iterate_sb_cpu -> event_filter_match ->
pmu_filter_match -> for_each_sibling_event
had been removed:

#define for_each_sibling_event(sibling, event)                  \
        if ((event)->group_leader == (event))                   \
                list_for_each_entry((sibling), &(event)->sibling_list,
sibling_list)

#define list_for_each_entry(pos, head, member)                          \
    for (pos = __container_of((head)->next, pos, member);               \
         &pos->member != (head);                                        \
         pos = __container_of(pos->member.next, pos, member))

crash> struct perf_event ffff8019e2fbd800
struct perf_event {
  event_entry = {
    next = 0xffff8019f1c27800,
    prev = 0xdead000000000200
  },
  ...
  state = PERF_EVENT_STATE_DEAD,
  ...
}

By the way, we also found another process which is deleting sibling_list:

crash> bt 3284533
PID: 3284533  TASK: ffff801f901ae880  CPU: 16  COMMAND: "YD"
 #0 [ffff801f8cd977f0] __switch_to at ffff000008088ba4
 #1 [ffff801f8cd97810] __schedule at ffff000008bf10c4
 #2 [ffff801f8cd97890] schedule at ffff000008bf17b0
 #3 [ffff801f8cd978a0] schedule_timeout at ffff000008bf5b10
 #4 [ffff801f8cd97960] wait_for_common at ffff000008bf2530
 #5 [ffff801f8cd979f0] wait_for_completion at ffff000008bf2644
 #6 [ffff801f8cd97a10] __wait_rcu_gp at ffff000008171c00
 #7 [ffff801f8cd97a80] synchronize_sched at ffff000008179da8
 #8 [ffff801f8cd97ad0] perf_trace_event_unreg at ffff000008216d50
 #9 [ffff801f8cd97b00] perf_trace_destroy at ffff000008217148
#10 [ffff801f8cd97b20] tp_perf_event_destroy at ffff000008256ae0
#11 [ffff801f8cd97b30] _free_event at ffff00000825f21c
#12 [ffff801f8cd97b70] put_event at ffff00000825faf0
#13 [ffff801f8cd97b80] perf_event_release_kernel at ffff00000825fcb8
#14 [ffff801f8cd97be0] perf_release at ffff00000825fdbc
#15 [ffff801f8cd97bf0] __fput at ffff00000832f0b8
#16 [ffff801f8cd97c30] ____fput at ffff00000832f28c
#17 [ffff801f8cd97c50] task_work_run at ffff00000810f8c8
#18 [ffff801f8cd97c90] do_exit at ffff0000080ef458
#19 [ffff801f8cd97cf0] do_group_exit at ffff0000080ef738
#20 [ffff801f8cd97d20] get_signal at ffff0000080fdde0
#21 [ffff801f8cd97d90] do_signal at ffff00000808e488
#22 [ffff801f8cd97e80] do_notify_resume at ffff00000808e7f4
#23 [ffff801f8cd97ff0] work_pending at ffff000008083f60


So it's reasonable to suspect that perf_iterate_sb is traversing
sibling_list while another
process is deleting it which eventually caused for_each_sibling_event
to endless loop and thus soft lockup.

The race scenario thus could be this:

CPU 29:                                   CPU 16:
                                          perf_event_release_kernel
                                      --> mutex_lock(&ctx->mutex)
                                         --> perf_remove_from_context
                                           --> perf_group_detach(event);
for_each_sibling_event()                              -->
list_del_init(&event->sibling_list)

As commit f3c0eba287049（“perf: Add a few assertions”）said:
“Notable for_each_sibling_event() relies on exclusion from
modification. This would normally be holding either ctx->lock or
ctx->mutex, however due to how things are constructed disabling IRQs
is a valid and sufficient substitute for ctx->lock.”, we think it's
necessary to hold ctx ->mutex, but currently LTS such as 4.19,5.4,5.10,
and 6.1 all does not do so:

perf_event_task
--> perf_iterate_sb
   --> perf_iterate_sb_cpu
      --> event_filter_match
          --> pmu_filter_match
             --> for_each_sibling_event

commit bd27568117664（“perf: Rewrite core context handling”）had removed
the pmu_filter_match operation, so it may be a temporary workaround
for this issue.

But it's necessary to confirm if there is a race problem between
sibling_list, and if it is, how
to fix currently LTS branches.

Thanks in advance.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Question about perf sibling_list race problem
  2023-11-29  9:33 Question about perf sibling_list race problem Zhengyuan Liu
@ 2023-11-29 10:32 ` Peter Zijlstra
  2023-12-05 11:57   ` Zhengyuan Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2023-11-29 10:32 UTC (permalink / raw)
  To: Zhengyuan Liu
  Cc: mingo, linux-perf-users, stable, 胡海,
	刘云, huangjinhui, Zhengyuan Liu, acme

On Wed, Nov 29, 2023 at 05:33:33PM +0800, Zhengyuan Liu wrote:
> Hi, all
> 
> We are encountering a perf related soft lockup as shown below:
> 
> [25023823.265138] watchdog: BUG: soft lockup - CPU#29 stuck for 45s!
> [YD:3284696]
> [25023823.275772]  net_failover virtio_scsi failover
> [25023823.276750] CPU: 29 PID: 3284696 Comm: YD Kdump: loaded Not
> tainted 4.19.90-23.18.v2101.ky10.aarch64 #1
          ^^^^^^^^^^^^^^^^^^^

That is some unholy ancient kernel. Please see if you can reproduce on
something recent.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Question about perf sibling_list race problem
  2023-11-29 10:32 ` Peter Zijlstra
@ 2023-12-05 11:57   ` Zhengyuan Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Zhengyuan Liu @ 2023-12-05 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-perf-users, stable, 胡海,
	刘云, huangjinhui, Zhengyuan Liu, acme

On Wed, Nov 29, 2023 at 6:32 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Nov 29, 2023 at 05:33:33PM +0800, Zhengyuan Liu wrote:
> > Hi, all
> >
> > We are encountering a perf related soft lockup as shown below:
> >
> > [25023823.265138] watchdog: BUG: soft lockup - CPU#29 stuck for 45s!
> > [YD:3284696]
> > [25023823.275772]  net_failover virtio_scsi failover
> > [25023823.276750] CPU: 29 PID: 3284696 Comm: YD Kdump: loaded Not
> > tainted 4.19.90-23.18.v2101.ky10.aarch64 #1
>           ^^^^^^^^^^^^^^^^^^^
>
> That is some unholy ancient kernel. Please see if you can reproduce on
> something recent.

Sorry for the late reply since my company mail server has some trouble.

I don't have a reproducer,  It's an online server and happens once
every few months.
From our analysis, the recent kernel shouldn't have this problem after commit
bd27568117664（“perf: Rewrite core context handling”).  But LTS  branches such as
v4.19 and v5.4 will be used for a long time, so I think it's worth
fixing this problem.

Thanks,

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-12-05 11:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-29  9:33 Question about perf sibling_list race problem Zhengyuan Liu
2023-11-29 10:32 ` Peter Zijlstra
2023-12-05 11:57   ` Zhengyuan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox