From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
intel-gfx <intel-gfx@lists.freedesktop.org>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Stable <stable@vger.kernel.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@kernel.org>,
Len Brown <lenb@kernel.org>
Subject: Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
Date: Wed, 8 Nov 2017 15:00:36 +0200 [thread overview]
Message-ID: <20171108130036.GZ10981@intel.com> (raw)
In-Reply-To: <2610015.zRzb04OUDn@aspire.rjw.lan>
On Wed, Nov 08, 2017 at 01:41:06PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrjälä wrote:
> > On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > > >> <ville.syrjala@linux.intel.com> wrote:
> > > >> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >> >
> > > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > > >>
> > > >> OK, good catch!
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > Cc: stable@vger.kernel.org
> > > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >> > Cc: Len Brown <lenb@kernel.org>
> > > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > > >> > Cc: Tejun Heo <tj@kernel.org>
> > > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > > >> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >> > ---
> > > >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > > >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> > > >> >
> > > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > > >> > index fbcc73f7a099..18af71057b44 100644
> > > >> > --- a/drivers/acpi/device_pm.c
> > > >> > +++ b/drivers/acpi/device_pm.c
> > > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > > >> >
> > > >> > #ifdef CONFIG_PM
> > > >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > void acpi_pm_wakeup_event(struct device *dev)
> > > >> > {
> > > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > > >> > if (!dev && !func)
> > > >> > return AE_BAD_PARAMETER;
> > > >> >
> > > >> > - mutex_lock(&acpi_pm_notifier_lock);
> > > >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > if (adev->wakeup.flags.notifier_present)
> > > >> > goto out;
> > > >> >
> > > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > - adev->wakeup.context.dev = dev;
> > > >> > - adev->wakeup.context.func = func;
> > > >> > -
> > > >>
> > > >> But this doesn't look good to me.
> > > >>
> > > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > > >>
> > > >> Actually, acpi_install_notify_handler() itself need not be called
> > > >> under the lock, because it does sufficient checks of its own.
> > > >>
> > > >> So say you do
> > > >>
> > > >> mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> if (adev->wakeup.context.dev)
> > > >> goto out;
> > > >>
> > > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> adev->wakeup.context.dev = dev;
> > > >> adev->wakeup.context.func = func;
> > > >>
> > > >> mutex_unlock(&acpi_pm_notifier_lock);
> > > >>
> > > >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > > >> > acpi_pm_notify_handler, NULL);
> > > >> > if (ACPI_FAILURE(status))
> > > >> > goto out;
> > > >> >
> > > >> > + mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > > >>
> > > >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > + adev->wakeup.context.dev = dev;
> > > >> > + adev->wakeup.context.func = func;
> > > >> > adev->wakeup.flags.notifier_present = true;
> > > >> > + mutex_unlock(&acpi_pm_notifier_lock);
> > > >> >
> > > >> > out:
> > > >> > - mutex_unlock(&acpi_pm_notifier_lock);
> > > >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > > >> > return status;
> > > >> > }
> > > >>
> > > >> Then on removal you can clear notifier_present first and drop the lock
> > > >> around the acpi_remove_notify_handler() call and nothing bad will
> > > >> happen.
> > > >>
> > > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > > >> instance will set context.dev and the second one will see it set and
> > > >> bail out. The first instance will then do the rest.
> > > >>
> > > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > > >> instance will see notifier_present set and will clear it, so the
> > > >> second one will see notifier_present unset and it will bail out.
> > > >>
> > > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > > >> unset and bail out, or the latter will see context.dev unset and bail
> > > >> out.
> > > >>
> > > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > > >
> > > > So something like the below (totally untested) should work too, shouldn't it?
> > >
> > > There is a problem if a device is removed while acpi_add_pm_notifier()
> > > is still in progress, in which case with my patch the
> > > acpi_remove_pm_notifier() called from the removal path may bail out
> > > prematurely (that doesn't seem likely to happen, but still I don't see
> > > why it cannot happen), so I'll just use your patch. :-)
> >
> > OK. I was just looking at your version and was pretty much convinced
> > that it would work. But I'll take your word that it might not :)
>
> Well, you don't have to. :-)
>
> The scenario I have in mind is as follows:
>
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
> lock. notifier_present is still unset.
>
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
> It is (still) unset, so the function decides that there's nothing to do.
>
> 3. acpi_add_pm_notifier() continues with notifier installation and the
> device goes away at the same time.
Right. That makes sense. Though I don't know what would prevent racing
add_pm_notifier() against remove_pm_notifier() in a similar way even
with the outer lock. In that case the remove_pm_notifier() would just
have to get at the mutex first and then we'd still end up with the
notifier installed.
--
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
intel-gfx <intel-gfx@lists.freedesktop.org>,
Stable <stable@vger.kernel.org>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Len Brown <lenb@kernel.org>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock
Date: Wed, 8 Nov 2017 15:00:36 +0200 [thread overview]
Message-ID: <20171108130036.GZ10981@intel.com> (raw)
In-Reply-To: <2610015.zRzb04OUDn@aspire.rjw.lan>
On Wed, Nov 08, 2017 at 01:41:06PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 8, 2017 1:31:22 PM CET Ville Syrj�l� wrote:
> > On Wed, Nov 08, 2017 at 01:23:56PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Nov 8, 2017 at 12:06 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > On Tuesday, November 7, 2017 11:47:54 PM CET Rafael J. Wysocki wrote:
> > > >> On Tue, Nov 7, 2017 at 10:08 PM, Ville Syrjala
> > > >> <ville.syrjala@linux.intel.com> wrote:
> > > >> > From: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > > >> >
> > > >> > acpi_remove_pm_notifier() ends up calling flush_workqueue() while
> > > >> > holding acpi_pm_notifier_lock, and that same lock is taken by
> > > >> > by the work via acpi_pm_notify_handler(). This can deadlock.
> > > >>
> > > >> OK, good catch!
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > Cc: stable@vger.kernel.org
> > > >> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >> > Cc: Len Brown <lenb@kernel.org>
> > > >> > Cc: Peter Zijlstra <peterz@infradead.org>
> > > >> > Cc: Tejun Heo <tj@kernel.org>
> > > >> > Cc: Ingo Molnar <mingo@kernel.org>
> > > >> > Fixes: c072530f391e ("ACPI / PM: Revork the handling of ACPI device wakeup notifications")
> > > >> > Signed-off-by: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > > >> > ---
> > > >> > drivers/acpi/device_pm.c | 21 ++++++++++++---------
> > > >> > 1 file changed, 12 insertions(+), 9 deletions(-)
> > > >> >
> > > >> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > > >> > index fbcc73f7a099..18af71057b44 100644
> > > >> > --- a/drivers/acpi/device_pm.c
> > > >> > +++ b/drivers/acpi/device_pm.c
> > > >> > @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable);
> > > >> >
> > > >> > #ifdef CONFIG_PM
> > > >> > static DEFINE_MUTEX(acpi_pm_notifier_lock);
> > > >> > +static DEFINE_MUTEX(acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > void acpi_pm_wakeup_event(struct device *dev)
> > > >> > {
> > > >> > @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev,
> > > >> > if (!dev && !func)
> > > >> > return AE_BAD_PARAMETER;
> > > >> >
> > > >> > - mutex_lock(&acpi_pm_notifier_lock);
> > > >> > + mutex_lock(&acpi_pm_notifier_install_lock);
> > > >> >
> > > >> > if (adev->wakeup.flags.notifier_present)
> > > >> > goto out;
> > > >> >
> > > >> > - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > - adev->wakeup.context.dev = dev;
> > > >> > - adev->wakeup.context.func = func;
> > > >> > -
> > > >>
> > > >> But this doesn't look good to me.
> > > >>
> > > >> notifier_present should be checked under acpi_pm_notifier_lock.
> > > >>
> > > >> Actually, acpi_install_notify_handler() itself need not be called
> > > >> under the lock, because it does sufficient checks of its own.
> > > >>
> > > >> So say you do
> > > >>
> > > >> mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> if (adev->wakeup.context.dev)
> > > >> goto out;
> > > >>
> > > >> adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> adev->wakeup.context.dev = dev;
> > > >> adev->wakeup.context.func = func;
> > > >>
> > > >> mutex_unlock(&acpi_pm_notifier_lock);
> > > >>
> > > >> > status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY,
> > > >> > acpi_pm_notify_handler, NULL);
> > > >> > if (ACPI_FAILURE(status))
> > > >> > goto out;
> > > >> >
> > > >> > + mutex_lock(&acpi_pm_notifier_lock);
> > > >>
> > > >> And here you just set notifier_present under acpi_pm_notifier_lock.
> > > >>
> > > >> > + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev));
> > > >> > + adev->wakeup.context.dev = dev;
> > > >> > + adev->wakeup.context.func = func;
> > > >> > adev->wakeup.flags.notifier_present = true;
> > > >> > + mutex_unlock(&acpi_pm_notifier_lock);
> > > >> >
> > > >> > out:
> > > >> > - mutex_unlock(&acpi_pm_notifier_lock);
> > > >> > + mutex_unlock(&acpi_pm_notifier_install_lock);
> > > >> > return status;
> > > >> > }
> > > >>
> > > >> Then on removal you can clear notifier_present first and drop the lock
> > > >> around the acpi_remove_notify_handler() call and nothing bad will
> > > >> happen.
> > > >>
> > > >> If you call acpi_add_pm_notifier() twice in parallel, the first
> > > >> instance will set context.dev and the second one will see it set and
> > > >> bail out. The first instance will then do the rest.
> > > >>
> > > >> If you call acpi_remove_pm_notifier() twice in a row, the first
> > > >> instance will see notifier_present set and will clear it, so the
> > > >> second one will see notifier_present unset and it will bail out.
> > > >>
> > > >> Now, if you call acpi_remove_pm_notifier() in parallel with
> > > >> acpi_add_pm_notifier(), either the former will see notifier_present
> > > >> unset and bail out, or the latter will see context.dev unset and bail
> > > >> out.
> > > >>
> > > >> It doesn't look like the outer lock is needed, or have I missed anything?
> > > >
> > > > So something like the below (totally untested) should work too, shouldn't it?
> > >
> > > There is a problem if a device is removed while acpi_add_pm_notifier()
> > > is still in progress, in which case with my patch the
> > > acpi_remove_pm_notifier() called from the removal path may bail out
> > > prematurely (that doesn't seem likely to happen, but still I don't see
> > > why it cannot happen), so I'll just use your patch. :-)
> >
> > OK. I was just looking at your version and was pretty much convinced
> > that it would work. But I'll take your word that it might not :)
>
> Well, you don't have to. :-)
>
> The scenario I have in mind is as follows:
>
> 1. acpi_add_pm_notifier() sets context.dev and context.func and drops the
> lock. notifier_present is still unset.
>
> 2. acpi_remove_pm_notifier() checks notifier_present under the lock.
> It is (still) unset, so the function decides that there's nothing to do.
>
> 3. acpi_add_pm_notifier() continues with notifier installation and the
> device goes away at the same time.
Right. That makes sense. Though I don't know what would prevent racing
add_pm_notifier() against remove_pm_notifier() in a similar way even
with the outer lock. In that case the remove_pm_notifier() would just
have to get at the mutex first and then we'd still end up with the
notifier installed.
--
Ville Syrj�l�
Intel OTC
next prev parent reply other threads:[~2017-11-08 13:00 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-07 21:08 [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs. flush_workqueue() deadlock Ville Syrjala
2017-11-07 22:47 ` Rafael J. Wysocki
2017-11-07 22:47 ` Rafael J. Wysocki
2017-11-07 23:06 ` Rafael J. Wysocki
2017-11-07 23:06 ` Rafael J. Wysocki
2017-11-08 12:23 ` Rafael J. Wysocki
2017-11-08 12:23 ` Rafael J. Wysocki
2017-11-08 12:31 ` Ville Syrjälä
2017-11-08 12:31 ` Ville Syrjälä
2017-11-08 12:41 ` Rafael J. Wysocki
2017-11-08 12:41 ` Rafael J. Wysocki
2017-11-08 12:55 ` Rafael J. Wysocki
2017-11-08 13:00 ` Ville Syrjälä [this message]
2017-11-08 13:00 ` Ville Syrjälä
2017-11-08 9:47 ` Rafael J. Wysocki
2017-11-08 9:47 ` Rafael J. Wysocki
2017-11-08 9:13 ` ✗ Fi.CI.BAT: failure for " Patchwork
2017-11-08 12:21 ` ✓ Fi.CI.BAT: success " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171108130036.GZ10981@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=rafael@kernel.org \
--cc=rjw@rjwysocki.net \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.