* [PATCH V2] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop
@ 2022-09-20 1:44 Duoming Zhou
2022-09-21 7:53 ` Ingo Molnar
0 siblings, 1 reply; 3+ messages in thread
From: Duoming Zhou @ 2022-09-20 1:44 UTC (permalink / raw)
To: peterz, linux-perf-users, linux-kernel
Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
tglx, bp, dave.hansen, x86, hpa, Duoming Zhou
There is a deadlock in rapl_pmu_event_stop(), the process is
shown below:
(thread 1) | (thread 2)
rapl_pmu_event_stop() | rapl_hrtimer_handle()
... | if (!pmu->n_active)
raw_spin_lock_irqsave() //(1) | ...
... |
hrtimer_cancel() | raw_spin_lock_irqsave() //(2)
(block forever)
We hold pmu->lock in position (1) and use hrtimer_cancel() to wait
rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need
pmu->lock in position (2). As a result, the rapl_pmu_event_stop()
will be blocked forever.
This patch extracts hrtimer_cancel() from the protection of
raw_spin_lock_irqsave(). As a result, the rapl_hrtimer_handle()
could obtain the pmu->lock.
Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in v2:
- Move hrtimer_cancel() to the end of rapl_pmu_event_stop() function.
arch/x86/events/rapl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 77e3a47af5a..7c110092c83 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -281,8 +281,6 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
if (!(hwc->state & PERF_HES_STOPPED)) {
WARN_ON_ONCE(pmu->n_active <= 0);
pmu->n_active--;
- if (pmu->n_active == 0)
- hrtimer_cancel(&pmu->hrtimer);
list_del(&event->active_entry);
@@ -300,6 +298,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
hwc->state |= PERF_HES_UPTODATE;
}
+ if (!pmu->n_active) {
+ raw_spin_unlock_irqrestore(&pmu->lock, flags);
+ hrtimer_cancel(&pmu->hrtimer);
+ return;
+ }
raw_spin_unlock_irqrestore(&pmu->lock, flags);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH V2] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop 2022-09-20 1:44 [PATCH V2] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop Duoming Zhou @ 2022-09-21 7:53 ` Ingo Molnar 2022-09-21 9:50 ` duoming 0 siblings, 1 reply; 3+ messages in thread From: Ingo Molnar @ 2022-09-21 7:53 UTC (permalink / raw) To: Duoming Zhou Cc: peterz, linux-perf-users, linux-kernel, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung, tglx, bp, dave.hansen, x86, hpa * Duoming Zhou <duoming@zju.edu.cn> wrote: > There is a deadlock in rapl_pmu_event_stop(), the process is > shown below: > > (thread 1) | (thread 2) > rapl_pmu_event_stop() | rapl_hrtimer_handle() > ... | if (!pmu->n_active) > raw_spin_lock_irqsave() //(1) | ... > ... | > hrtimer_cancel() | raw_spin_lock_irqsave() //(2) > (block forever) > > We hold pmu->lock in position (1) and use hrtimer_cancel() to wait > rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need > pmu->lock in position (2). As a result, the rapl_pmu_event_stop() > will be blocked forever. > > This patch extracts hrtimer_cancel() from the protection of > raw_spin_lock_irqsave(). As a result, the rapl_hrtimer_handle() > could obtain the pmu->lock. > > Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support") > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > --- > Changes in v2: > - Move hrtimer_cancel() to the end of rapl_pmu_event_stop() function. > > arch/x86/events/rapl.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c > index 77e3a47af5a..7c110092c83 100644 > --- a/arch/x86/events/rapl.c > +++ b/arch/x86/events/rapl.c > @@ -281,8 +281,6 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode) > if (!(hwc->state & PERF_HES_STOPPED)) { > WARN_ON_ONCE(pmu->n_active <= 0); > pmu->n_active--; > - if (pmu->n_active == 0) > - hrtimer_cancel(&pmu->hrtimer); > > list_del(&event->active_entry); > > @@ -300,6 +298,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode) > hwc->state |= PERF_HES_UPTODATE; > } > > + if (!pmu->n_active) { > + raw_spin_unlock_irqrestore(&pmu->lock, flags); > + hrtimer_cancel(&pmu->hrtimer); > + return; > + } Looks racy now: AFAICS now it's possible for rapl_hrtimer_handle() to execute at an arbitrary moment after pmu->lock is dropped - which could be use-after-free after cleanup_rapl_pmus() executes and the PMU is freed, right? There's also the quality-of-implementation issue of the hrtimer executing in a delayed fashion for the *next* event that may have been added, leading to possibly unexpected results. Thanks, Ingo ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH V2] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop 2022-09-21 7:53 ` Ingo Molnar @ 2022-09-21 9:50 ` duoming 0 siblings, 0 replies; 3+ messages in thread From: duoming @ 2022-09-21 9:50 UTC (permalink / raw) To: Ingo Molnar Cc: peterz, linux-perf-users, linux-kernel, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung, tglx, bp, dave.hansen, x86, hpa Hello, On Wed, 21 Sep 2022 09:53:04 +0200 Ingo Molnar wrote: > * Duoming Zhou <duoming@zju.edu.cn> wrote: > > > There is a deadlock in rapl_pmu_event_stop(), the process is > > shown below: > > > > (thread 1) | (thread 2) > > rapl_pmu_event_stop() | rapl_hrtimer_handle() > > ... | if (!pmu->n_active) > > raw_spin_lock_irqsave() //(1) | ... > > ... | > > hrtimer_cancel() | raw_spin_lock_irqsave() //(2) > > (block forever) > > > > We hold pmu->lock in position (1) and use hrtimer_cancel() to wait > > rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need > > pmu->lock in position (2). As a result, the rapl_pmu_event_stop() > > will be blocked forever. > > > > This patch extracts hrtimer_cancel() from the protection of > > raw_spin_lock_irqsave(). As a result, the rapl_hrtimer_handle() > > could obtain the pmu->lock. > > > > Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support") > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> > > --- > > Changes in v2: > > - Move hrtimer_cancel() to the end of rapl_pmu_event_stop() function. > > > > arch/x86/events/rapl.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c > > index 77e3a47af5a..7c110092c83 100644 > > --- a/arch/x86/events/rapl.c > > +++ b/arch/x86/events/rapl.c > > @@ -281,8 +281,6 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode) > > if (!(hwc->state & PERF_HES_STOPPED)) { > > WARN_ON_ONCE(pmu->n_active <= 0); > > pmu->n_active--; > > - if (pmu->n_active == 0) > > - hrtimer_cancel(&pmu->hrtimer); > > > > list_del(&event->active_entry); > > > > @@ -300,6 +298,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode) > > hwc->state |= PERF_HES_UPTODATE; > > } > > > > + if (!pmu->n_active) { > > + raw_spin_unlock_irqrestore(&pmu->lock, flags); > > + hrtimer_cancel(&pmu->hrtimer); > > + return; > > + } > > Looks racy now: AFAICS now it's possible for rapl_hrtimer_handle() to > execute at an arbitrary moment after pmu->lock is dropped - which could be > use-after-free after cleanup_rapl_pmus() executes and the PMU is freed, > right? > > There's also the quality-of-implementation issue of the hrtimer executing > in a delayed fashion for the *next* event that may have been added, leading > to possibly unexpected results. Thank your for your suggestions! In order to solve the above problems, I come up with the following solution. diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c index 77e3a47af5a..a526a08ee6e 100644 --- a/arch/x86/events/rapl.c +++ b/arch/x86/events/rapl.c @@ -219,11 +219,13 @@ static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer) struct perf_event *event; unsigned long flags; - if (!pmu->n_active) - return HRTIMER_NORESTART; - raw_spin_lock_irqsave(&pmu->lock, flags); + if (!pmu->n_active) { + raw_spin_unlock_irqrestore(&pmu->lock, flags); + return HRTIMER_NORESTART; + } + list_for_each_entry(event, &pmu->active_list, active_entry) rapl_event_update(event); @@ -282,7 +284,7 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode) WARN_ON_ONCE(pmu->n_active <= 0); pmu->n_active--; if (pmu->n_active == 0) - hrtimer_cancel(&pmu->hrtimer); + hrtimer_try_to_cancel(&pmu->hrtimer); list_del(&event->active_entry); Firstly, the deadlock could be mitigated. Because if the timer callback function is running, the hrtimer_try_to_cancel() will directly return. Secondly, the race could be avoided. Because we use pmu->lock to synchronize and move the check "if (!pmu->n_active)" into the protection scope of pmu->lock. If the rapl_pmu_event_stop() has finished, the "pmu->n_active" equals to 0 and the rapl_hrtimer_handle() will return "HRTIMER_NORESTART". Thirdly, this solution will not cause quality-of-implementation issue of the hrtimer. Best regards, Duoming Zhou ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-21 9:53 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-20 1:44 [PATCH V2] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop Duoming Zhou 2022-09-21 7:53 ` Ingo Molnar 2022-09-21 9:50 ` duoming
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox