* Re: [PATCH v2 0/9] driver core / pmdomain: Add support for fined grained sync_state
[not found] ` <CAPDyKFrPz9gaBBp6xV1=KkoemEfapc0p3POZxuBTvDw7Vamxtg@mail.gmail.com>
@ 2026-04-18 11:23 ` Danilo Krummrich
0 siblings, 0 replies; 10+ messages in thread
From: Danilo Krummrich @ 2026-04-18 11:23 UTC (permalink / raw)
To: Ulf Hansson
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, driver-core
On Fri Apr 17, 2026 at 1:27 PM CEST, Ulf Hansson wrote:
> + Danilo (for the driver core changes)
Thanks -- please also remember to Cc: driver-core@lists.linux.dev.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
[not found] ` <20260410104058.83748-2-ulf.hansson@linaro.org>
@ 2026-04-18 11:23 ` Danilo Krummrich
2026-04-22 10:07 ` Ulf Hansson
0 siblings, 1 reply; 10+ messages in thread
From: Danilo Krummrich @ 2026-04-18 11:23 UTC (permalink / raw)
To: Ulf Hansson
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> The common sync_state support isn't fine grained enough for some types of
> suppliers, like power domains for example. Especially when a supplier
> provides multiple independent power domains, each with their own set of
> consumers. In these cases we need to wait for all consumers for all the
> provided power domains before invoking the supplier's ->sync_state().
>
> To allow a more fine grained sync_state support to be implemented on per
> supplier's driver basis, let's add a new optional callback. As soon as
> there is an update worth to consider in regards to managing sync_state for
> a supplier device, __device_links_queue_sync_state() invokes the callback.
>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
> drivers/base/core.c | 7 ++++++-
> include/linux/device/driver.h | 7 +++++++
> 2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 09b98f02f559..4085a011d8ca 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1106,7 +1106,9 @@ int device_links_check_suppliers(struct device *dev)
> * Queues a device for a sync_state() callback when the device links write lock
> * isn't held. This allows the sync_state() execution flow to use device links
> * APIs. The caller must ensure this function is called with
> - * device_links_write_lock() held.
> + * device_links_write_lock() held. Note, if the optional queue_sync_state()
> + * callback has been assigned too, it gets called for every update to allowing a
s/allowing/allow/
> + * more fine grained support to be implemented on per supplier basis.
> *
> * This function does a get_device() to make sure the device is not freed while
> * on this list.
> @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
> if (dev->state_synced)
> return;
>
> + if (dev->driver && dev->driver->queue_sync_state)
> + dev->driver->queue_sync_state(dev);
This seems to be called without the device lock being held, which seems to allow
the queue_sync_state() callback to execute concurrently with remove(). This
opens the door for all kinds of UAF conditions in drivers.
This also made me aware that the above dev_has_sync_state() is probably broken,
as it also performs the following check without the device lock being held.
dev->driver && dev->driver->sync_state
I think nothing prevents dev->driver to become NULL concurrently; in this case
READ_ONCE() should be sufficient though as it doesn't execute the callback.
I will send a patch for this.
> +
> list_for_each_entry(link, &dev->links.consumers, s_node) {
> if (!device_link_test(link, DL_FLAG_MANAGED))
> continue;
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index bbc67ec513ed..bc9ae1cbe03c 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -68,6 +68,12 @@ enum probe_type {
> * be called at late_initcall_sync level. If the device has
> * consumers that are never bound to a driver, this function
> * will never get called until they do.
> + * @queue_sync_state: Similar to the ->sync_state() callback, but called to
> + * allow syncing device state to software state in a more fine
> + * grained way. It is called when there is an updated state that
> + * may be worth to consider for any of the consumers linked to
> + * this device. If implemented, the ->sync_state() callback is
> + * required too.
What happens if this is not the case? Maybe worth to check and warn about this
in driver_register().
> * @remove: Called when the device is removed from the system to
> * unbind a device from this driver.
> * @shutdown: Called at shut-down time to quiesce the device.
> @@ -110,6 +116,7 @@ struct device_driver {
>
> int (*probe) (struct device *dev);
> void (*sync_state)(struct device *dev);
> + void (*queue_sync_state)(struct device *dev);
> int (*remove) (struct device *dev);
> void (*shutdown) (struct device *dev);
> int (*suspend) (struct device *dev, pm_message_t state);
> --
> 2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/9] driver core: Add dev_set_drv_queue_sync_state()
[not found] ` <20260410104058.83748-3-ulf.hansson@linaro.org>
@ 2026-04-18 11:23 ` Danilo Krummrich
2026-04-22 10:25 ` Ulf Hansson
0 siblings, 1 reply; 10+ messages in thread
From: Danilo Krummrich @ 2026-04-18 11:23 UTC (permalink / raw)
To: Ulf Hansson
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> diff --git a/include/linux/device.h b/include/linux/device.h
> index e65d564f01cd..f812e70bdf22 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -994,6 +994,18 @@ static inline int dev_set_drv_sync_state(struct device *dev,
> return 0;
> }
>
> +static inline int dev_set_drv_queue_sync_state(struct device *dev,
> + void (*fn)(struct device *dev))
As this is a public function, please add some documentation.
> +{
> + if (!dev || !dev->driver)
> + return 0;
> + if (dev->driver->queue_sync_state && dev->driver->queue_sync_state != fn)
> + return -EBUSY;
> + if (!dev->driver->queue_sync_state)
> + dev->driver->queue_sync_state = fn;
I think this follows dev_set_drv_sync_state(), but I think it is worth pointing
out that it is yet another blocker for moving towards
const struct device_driver.
> + return 0;
> +}
> +
> static inline void dev_set_removable(struct device *dev,
> enum device_removable removable)
> {
> --
> 2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
2026-04-18 11:23 ` [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support Danilo Krummrich
@ 2026-04-22 10:07 ` Ulf Hansson
2026-04-22 10:59 ` Danilo Krummrich
0 siblings, 1 reply; 10+ messages in thread
From: Ulf Hansson @ 2026-04-22 10:07 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Sat, 18 Apr 2026 at 13:23, Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> > The common sync_state support isn't fine grained enough for some types of
> > suppliers, like power domains for example. Especially when a supplier
> > provides multiple independent power domains, each with their own set of
> > consumers. In these cases we need to wait for all consumers for all the
> > provided power domains before invoking the supplier's ->sync_state().
> >
> > To allow a more fine grained sync_state support to be implemented on per
> > supplier's driver basis, let's add a new optional callback. As soon as
> > there is an update worth to consider in regards to managing sync_state for
> > a supplier device, __device_links_queue_sync_state() invokes the callback.
> >
> > Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> > drivers/base/core.c | 7 ++++++-
> > include/linux/device/driver.h | 7 +++++++
> > 2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 09b98f02f559..4085a011d8ca 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1106,7 +1106,9 @@ int device_links_check_suppliers(struct device *dev)
> > * Queues a device for a sync_state() callback when the device links write lock
> > * isn't held. This allows the sync_state() execution flow to use device links
> > * APIs. The caller must ensure this function is called with
> > - * device_links_write_lock() held.
> > + * device_links_write_lock() held. Note, if the optional queue_sync_state()
> > + * callback has been assigned too, it gets called for every update to allowing a
>
> s/allowing/allow/
>
> > + * more fine grained support to be implemented on per supplier basis.
> > *
> > * This function does a get_device() to make sure the device is not freed while
> > * on this list.
> > @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
> > if (dev->state_synced)
> > return;
> >
> > + if (dev->driver && dev->driver->queue_sync_state)
> > + dev->driver->queue_sync_state(dev);
>
> This seems to be called without the device lock being held, which seems to allow
> the queue_sync_state() callback to execute concurrently with remove(). This
> opens the door for all kinds of UAF conditions in drivers.
If that were the case, this whole function would be unsafe even before
this change. I assume this isn't because of how the function is being
called, but I may be wrong.
Anyway, let me add a get/put_device() here somewhere, to ensure we
prevent this from happening. I assume that is what you are proposing?
>
> This also made me aware that the above dev_has_sync_state() is probably broken,
> as it also performs the following check without the device lock being held.
>
> dev->driver && dev->driver->sync_state
>
> I think nothing prevents dev->driver to become NULL concurrently; in this case
> READ_ONCE() should be sufficient though as it doesn't execute the callback.
>
> I will send a patch for this.
Okay, thanks!
>
> > +
> > list_for_each_entry(link, &dev->links.consumers, s_node) {
> > if (!device_link_test(link, DL_FLAG_MANAGED))
> > continue;
> > diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> > index bbc67ec513ed..bc9ae1cbe03c 100644
> > --- a/include/linux/device/driver.h
> > +++ b/include/linux/device/driver.h
> > @@ -68,6 +68,12 @@ enum probe_type {
> > * be called at late_initcall_sync level. If the device has
> > * consumers that are never bound to a driver, this function
> > * will never get called until they do.
> > + * @queue_sync_state: Similar to the ->sync_state() callback, but called to
> > + * allow syncing device state to software state in a more fine
> > + * grained way. It is called when there is an updated state that
> > + * may be worth to consider for any of the consumers linked to
> > + * this device. If implemented, the ->sync_state() callback is
> > + * required too.
>
> What happens if this is not the case? Maybe worth to check and warn about this
> in driver_register().
Good point!
I believe I should also add a check in dev_set_drv_queue_sync_state()
that is added in patch2.
>
> > * @remove: Called when the device is removed from the system to
> > * unbind a device from this driver.
> > * @shutdown: Called at shut-down time to quiesce the device.
> > @@ -110,6 +116,7 @@ struct device_driver {
> >
> > int (*probe) (struct device *dev);
> > void (*sync_state)(struct device *dev);
> > + void (*queue_sync_state)(struct device *dev);
> > int (*remove) (struct device *dev);
> > void (*shutdown) (struct device *dev);
> > int (*suspend) (struct device *dev, pm_message_t state);
> > --
> > 2.43.0
>
Thanks for reviewing!
Kind regards
Uffe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/9] driver core: Add dev_set_drv_queue_sync_state()
2026-04-18 11:23 ` [PATCH v2 2/9] driver core: Add dev_set_drv_queue_sync_state() Danilo Krummrich
@ 2026-04-22 10:25 ` Ulf Hansson
2026-04-22 11:30 ` Danilo Krummrich
0 siblings, 1 reply; 10+ messages in thread
From: Ulf Hansson @ 2026-04-22 10:25 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Sat, 18 Apr 2026 at 13:24, Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index e65d564f01cd..f812e70bdf22 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -994,6 +994,18 @@ static inline int dev_set_drv_sync_state(struct device *dev,
> > return 0;
> > }
> >
> > +static inline int dev_set_drv_queue_sync_state(struct device *dev,
> > + void (*fn)(struct device *dev))
>
> As this is a public function, please add some documentation.
Most of the static inline functions in device.h lacks documentation.
Are you suggesting that we should move towards documenting all of
them?
In this case, dev_set_drv_sync_state() also lacks documentation and to
me, it doesn't make sense to add documentation only for
dev_set_drv_queue_sync_state(). Do you want me to add it for both?
[...]
Kind regards
Uffe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
2026-04-22 10:07 ` Ulf Hansson
@ 2026-04-22 10:59 ` Danilo Krummrich
2026-05-05 11:12 ` Ulf Hansson
0 siblings, 1 reply; 10+ messages in thread
From: Danilo Krummrich @ 2026-04-22 10:59 UTC (permalink / raw)
To: Ulf Hansson
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Wed Apr 22, 2026 at 12:07 PM CEST, Ulf Hansson wrote:
> On Sat, 18 Apr 2026 at 13:23, Danilo Krummrich <dakr@kernel.org> wrote:
>> On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
>> > @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
>> > if (dev->state_synced)
>> > return;
>> >
>> > + if (dev->driver && dev->driver->queue_sync_state)
>> > + dev->driver->queue_sync_state(dev);
>>
>> This seems to be called without the device lock being held, which seems to allow
>> the queue_sync_state() callback to execute concurrently with remove(). This
>> opens the door for all kinds of UAF conditions in drivers.
>
> If that were the case, this whole function would be unsafe even before
> this change. I assume this isn't because of how the function is being
> called, but I may be wrong.
This function does not issue any driver callbacks intentionally; the existing
sync_state() callback is deferred to device_links_flush_sync_list(), which is
called without the device_links_write_lock() held, but takes the device_lock()
to protect against other concurrent driver callbacks, such as remove().
I.e. we can't take the device_lock() when the device_links_write_lock() is held,
as it would be prone to lock inversion.
The documentation of __device_links_queue_sync_state() actually slightly hints
at this, but focuses more on the other reason for the deferred semantics -- the
sync_state() callback may want to call device link APIs.
> Anyway, let me add a get/put_device() here somewhere, to ensure we
> prevent this from happening. I assume that is what you are proposing?
No, an additional device reference count won't protect against other concurrent
driver callbacks, such as remove().
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/9] driver core: Add dev_set_drv_queue_sync_state()
2026-04-22 10:25 ` Ulf Hansson
@ 2026-04-22 11:30 ` Danilo Krummrich
0 siblings, 0 replies; 10+ messages in thread
From: Danilo Krummrich @ 2026-04-22 11:30 UTC (permalink / raw)
To: Ulf Hansson
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Wed Apr 22, 2026 at 12:25 PM CEST, Ulf Hansson wrote:
> Most of the static inline functions in device.h lacks documentation.
> Are you suggesting that we should move towards documenting all of
> them?
I'd prefer that, yes. After all, if they are defined in device.h, they are
public APIs, so they should ideally have documentation.
(If a function is only used throughout the driver core, it should be in
drivers/base/base.h instead.)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
2026-04-22 10:59 ` Danilo Krummrich
@ 2026-05-05 11:12 ` Ulf Hansson
2026-05-05 12:46 ` Danilo Krummrich
0 siblings, 1 reply; 10+ messages in thread
From: Ulf Hansson @ 2026-05-05 11:12 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Wed, 22 Apr 2026 at 12:59, Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Wed Apr 22, 2026 at 12:07 PM CEST, Ulf Hansson wrote:
> > On Sat, 18 Apr 2026 at 13:23, Danilo Krummrich <dakr@kernel.org> wrote:
> >> On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> >> > @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
> >> > if (dev->state_synced)
> >> > return;
> >> >
> >> > + if (dev->driver && dev->driver->queue_sync_state)
> >> > + dev->driver->queue_sync_state(dev);
> >>
> >> This seems to be called without the device lock being held, which seems to allow
> >> the queue_sync_state() callback to execute concurrently with remove(). This
> >> opens the door for all kinds of UAF conditions in drivers.
> >
> > If that were the case, this whole function would be unsafe even before
> > this change. I assume this isn't because of how the function is being
> > called, but I may be wrong.
>
> This function does not issue any driver callbacks intentionally; the existing
> sync_state() callback is deferred to device_links_flush_sync_list(), which is
> called without the device_links_write_lock() held, but takes the device_lock()
> to protect against other concurrent driver callbacks, such as remove().
>
> I.e. we can't take the device_lock() when the device_links_write_lock() is held,
> as it would be prone to lock inversion.
You are right, the device_lock() is needed in
device_links_flush_sync_list() to protect against concurrent driver
callbacks from being invoked, such as the ->remove().
However, as part of the removal procedure in
__device_release_driver(), we also need to account for device links.
Hence we call device_links_busy() which takes the
device_links_write_lock().
>
> The documentation of __device_links_queue_sync_state() actually slightly hints
> at this, but focuses more on the other reason for the deferred semantics -- the
> sync_state() callback may want to call device link APIs.
>
> > Anyway, let me add a get/put_device() here somewhere, to ensure we
> > prevent this from happening. I assume that is what you are proposing?
>
> No, an additional device reference count won't protect against other concurrent
> driver callbacks, such as remove().
Because _device_links_queue_sync_state() is always called while
holding the device_links_write_lock(), it I believe it should be
perfectly safe to invoke the driver callbacks from there.
Kind regards
Uffe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
2026-05-05 11:12 ` Ulf Hansson
@ 2026-05-05 12:46 ` Danilo Krummrich
2026-05-05 14:15 ` Ulf Hansson
0 siblings, 1 reply; 10+ messages in thread
From: Danilo Krummrich @ 2026-05-05 12:46 UTC (permalink / raw)
To: Ulf Hansson
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
On Tue May 5, 2026 at 1:12 PM CEST, Ulf Hansson wrote:
> On Wed, 22 Apr 2026 at 12:59, Danilo Krummrich <dakr@kernel.org> wrote:
>>
>> On Wed Apr 22, 2026 at 12:07 PM CEST, Ulf Hansson wrote:
>> > On Sat, 18 Apr 2026 at 13:23, Danilo Krummrich <dakr@kernel.org> wrote:
>> >> On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
>> >> > @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
>> >> > if (dev->state_synced)
>> >> > return;
>> >> >
>> >> > + if (dev->driver && dev->driver->queue_sync_state)
>> >> > + dev->driver->queue_sync_state(dev);
>> >>
>> >> This seems to be called without the device lock being held, which seems to allow
>> >> the queue_sync_state() callback to execute concurrently with remove(). This
>> >> opens the door for all kinds of UAF conditions in drivers.
>> >
>> > If that were the case, this whole function would be unsafe even before
>> > this change. I assume this isn't because of how the function is being
>> > called, but I may be wrong.
>>
>> This function does not issue any driver callbacks intentionally; the existing
>> sync_state() callback is deferred to device_links_flush_sync_list(), which is
>> called without the device_links_write_lock() held, but takes the device_lock()
>> to protect against other concurrent driver callbacks, such as remove().
>>
>> I.e. we can't take the device_lock() when the device_links_write_lock() is held,
>> as it would be prone to lock inversion.
>
> You are right, the device_lock() is needed in
> device_links_flush_sync_list() to protect against concurrent driver
> callbacks from being invoked, such as the ->remove().
>
> However, as part of the removal procedure in
> __device_release_driver(), we also need to account for device links.
> Hence we call device_links_busy() which takes the
> device_links_write_lock().
I don't think this is sufficient.
I wrapped my head around this about three weeks ago and I think there were
multiple cases where this falls apart. I think one case was:
1. Consumer binds, supplier is added to deferred_sync. Consumer unbinds,
link goes to AVAILABLE.
2. device_links_busy(supplier) takes and releases device_links_write_lock(),
finds no ACTIVE consumers, returns false -- drv->remove() starts without
device_links_write_lock() held.
3. device_links_supplier_sync_state_resume() takes device_links_write_lock(),
finds supplier still on deferred_sync -- device_links_driver_cleanup()
hasn't run yet, blocked on the lock. queue_sync_state() is called
concurrently with drv->remove().
But again, I think that's what I concluded three weeks ago and this state
machine is rather tricky.
That said, *even if* we could prove that this works in all cases, it would be
very subtle, pretty fragile and not by design.
The whole state machine around sync state is already rather complicated. So, if
we really want to make it an invariant that it is valid to call bus device
callbacks without holding the device lock from
__device_links_queue_sync_state(), I think this has to be carefully reflected by
the overall design making it *very explicit* how this invariant holds up in all
cases.
Otherwise, I'd be rather concerned that this becomes a source of subtle bugs.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
2026-05-05 12:46 ` Danilo Krummrich
@ 2026-05-05 14:15 ` Ulf Hansson
0 siblings, 0 replies; 10+ messages in thread
From: Ulf Hansson @ 2026-05-05 14:15 UTC (permalink / raw)
To: Danilo Krummrich, Saravana Kannan
Cc: Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm, Sudeep Holla,
Cristian Marussi, Kevin Hilman, Stephen Boyd, Marek Szyprowski,
Bjorn Andersson, Abel Vesa, Peng Fan, Tomi Valkeinen, Maulik Shah,
Konrad Dybcio, Thierry Reding, Jonathan Hunter,
Geert Uytterhoeven, Dmitry Baryshkov, linux-arm-kernel,
linux-kernel, Geert Uytterhoeven, driver-core
On Tue, 5 May 2026 at 14:46, Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Tue May 5, 2026 at 1:12 PM CEST, Ulf Hansson wrote:
> > On Wed, 22 Apr 2026 at 12:59, Danilo Krummrich <dakr@kernel.org> wrote:
> >>
> >> On Wed Apr 22, 2026 at 12:07 PM CEST, Ulf Hansson wrote:
> >> > On Sat, 18 Apr 2026 at 13:23, Danilo Krummrich <dakr@kernel.org> wrote:
> >> >> On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> >> >> > @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
> >> >> > if (dev->state_synced)
> >> >> > return;
> >> >> >
> >> >> > + if (dev->driver && dev->driver->queue_sync_state)
> >> >> > + dev->driver->queue_sync_state(dev);
> >> >>
> >> >> This seems to be called without the device lock being held, which seems to allow
> >> >> the queue_sync_state() callback to execute concurrently with remove(). This
> >> >> opens the door for all kinds of UAF conditions in drivers.
> >> >
> >> > If that were the case, this whole function would be unsafe even before
> >> > this change. I assume this isn't because of how the function is being
> >> > called, but I may be wrong.
> >>
> >> This function does not issue any driver callbacks intentionally; the existing
> >> sync_state() callback is deferred to device_links_flush_sync_list(), which is
> >> called without the device_links_write_lock() held, but takes the device_lock()
> >> to protect against other concurrent driver callbacks, such as remove().
> >>
> >> I.e. we can't take the device_lock() when the device_links_write_lock() is held,
> >> as it would be prone to lock inversion.
> >
> > You are right, the device_lock() is needed in
> > device_links_flush_sync_list() to protect against concurrent driver
> > callbacks from being invoked, such as the ->remove().
> >
> > However, as part of the removal procedure in
> > __device_release_driver(), we also need to account for device links.
> > Hence we call device_links_busy() which takes the
> > device_links_write_lock().
>
> I don't think this is sufficient.
>
> I wrapped my head around this about three weeks ago and I think there were
> multiple cases where this falls apart. I think one case was:
>
> 1. Consumer binds, supplier is added to deferred_sync. Consumer unbinds,
> link goes to AVAILABLE.
>
> 2. device_links_busy(supplier) takes and releases device_links_write_lock(),
> finds no ACTIVE consumers, returns false -- drv->remove() starts without
> device_links_write_lock() held.
>
> 3. device_links_supplier_sync_state_resume() takes device_links_write_lock(),
> finds supplier still on deferred_sync -- device_links_driver_cleanup()
> hasn't run yet, blocked on the lock. queue_sync_state() is called
> concurrently with drv->remove().
Thanks for pointing this out! More problems seem to exist regarding the above.
What if drv->remove(supplier) completes before
device_links_supplier_sync_state_resume() is called and the removed
device is still in the deferred_sync list? Probably another
get|put_device() is needed to protect us from UAF.
>
> But again, I think that's what I concluded three weeks ago and this state
> machine is rather tricky.
Indeed it is.
>
> That said, *even if* we could prove that this works in all cases, it would be
> very subtle, pretty fragile and not by design.
>
> The whole state machine around sync state is already rather complicated. So, if
> we really want to make it an invariant that it is valid to call bus device
> callbacks without holding the device lock from
> __device_links_queue_sync_state(), I think this has to be carefully reflected by
> the overall design making it *very explicit* how this invariant holds up in all
> cases.
Right.
I experimented with adding the devices for ->queue_sync_state() into a
separate list and then flushing that list similarly to how the
sync_state list is flushed. It becomes messy, but I guess there is no
other way.
Perhaps Saravana has another suggestion for how to implement this?
>
> Otherwise, I'd be rather concerned that this becomes a source of subtle bugs.
Kind regards
Uffe
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-05 14:16 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260410104058.83748-1-ulf.hansson@linaro.org>
[not found] ` <CAPDyKFrPz9gaBBp6xV1=KkoemEfapc0p3POZxuBTvDw7Vamxtg@mail.gmail.com>
2026-04-18 11:23 ` [PATCH v2 0/9] driver core / pmdomain: Add support for fined grained sync_state Danilo Krummrich
[not found] ` <20260410104058.83748-2-ulf.hansson@linaro.org>
2026-04-18 11:23 ` [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support Danilo Krummrich
2026-04-22 10:07 ` Ulf Hansson
2026-04-22 10:59 ` Danilo Krummrich
2026-05-05 11:12 ` Ulf Hansson
2026-05-05 12:46 ` Danilo Krummrich
2026-05-05 14:15 ` Ulf Hansson
[not found] ` <20260410104058.83748-3-ulf.hansson@linaro.org>
2026-04-18 11:23 ` [PATCH v2 2/9] driver core: Add dev_set_drv_queue_sync_state() Danilo Krummrich
2026-04-22 10:25 ` Ulf Hansson
2026-04-22 11:30 ` Danilo Krummrich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox