public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] driver core: fix async device shutdown hang
@ 2024-09-17 20:15 Stuart Hayes
  2024-09-17 20:42 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 8+ messages in thread
From: Stuart Hayes @ 2024-09-17 20:15 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka
  Cc: Stuart Hayes

Modify device_shutdown() so that supplier devices do not wait for
consumer devices to be shut down first when the devlink is sync state
only, since the consumer is not dependent on the supplier in this case.

Without this change, a circular dependency could hang the system.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
 drivers/base/core.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b69b82da8837..76513e360496 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4898,8 +4898,16 @@ void device_shutdown(void)
 
 		idx = device_links_read_lock();
 		list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
-				device_links_read_lock_held())
+				device_links_read_lock_held()) {
+			/*
+			 * sync_state_only suppliers don't need to wait,
+			 * aren't reordered on devices_kset, so making them
+			 * wait could result in a hang
+			 */
+			if (device_link_flag_is_sync_state_only(link->flags))
+				continue;
 			link->supplier->p->shutdown_after = cookie;
+		}
 		device_links_read_unlock(idx);
 		put_device(dev);
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: fix async device shutdown hang
  2024-09-17 20:15 Stuart Hayes
@ 2024-09-17 20:42 ` Greg Kroah-Hartman
  2024-09-18  0:20   ` stuart hayes
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Kroah-Hartman @ 2024-09-17 20:42 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: linux-kernel, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	David Jeffery, Jeremy Allison, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka

On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
> Modify device_shutdown() so that supplier devices do not wait for
> consumer devices to be shut down first when the devlink is sync state
> only, since the consumer is not dependent on the supplier in this case.
> 
> Without this change, a circular dependency could hang the system.
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>

What commit id does this fix?  Should it go to stable?

And what driver is causing this problem, is this a regression or for
something new that just got added to the tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: fix async device shutdown hang
  2024-09-17 20:42 ` Greg Kroah-Hartman
@ 2024-09-18  0:20   ` stuart hayes
  2024-09-18  6:12     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 8+ messages in thread
From: stuart hayes @ 2024-09-18  0:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	David Jeffery, Jeremy Allison, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka



On 9/17/2024 3:42 PM, Greg Kroah-Hartman wrote:
> On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
>> Modify device_shutdown() so that supplier devices do not wait for
>> consumer devices to be shut down first when the devlink is sync state
>> only, since the consumer is not dependent on the supplier in this case.
>>
>> Without this change, a circular dependency could hang the system.
>>
>> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> 
> What commit id does this fix?  Should it go to stable?
> 
> And what driver is causing this problem, is this a regression or for
> something new that just got added to the tree?
> 
> thanks,
> 
> greg k-h

This fixes commit 8064952c65045f05ee2671fe437770e50c151776, in
driver-core-next & linux-next... it's problem with code that was just
added to the tree (in drivers/base/core.c).  It is not in stable.

Apologies, I should have mentioned that from the start.

The issue was found using qemu... a pl061 device (supplier) and
gpio-keys device (consumer), from a qemu-generated device tree with
the aarch64 architecture.  I don't know why the devlink is in the
sync_state_only state in this case.  I didn't dig into that, because
I figured the shutdown code shouldn't hang regardless.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: fix async device shutdown hang
  2024-09-18  0:20   ` stuart hayes
@ 2024-09-18  6:12     ` Greg Kroah-Hartman
  2024-09-18 21:07       ` Laurence Oberman
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Kroah-Hartman @ 2024-09-18  6:12 UTC (permalink / raw)
  To: stuart hayes
  Cc: linux-kernel, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	David Jeffery, Jeremy Allison, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka

On Tue, Sep 17, 2024 at 07:20:41PM -0500, stuart hayes wrote:
> 
> 
> On 9/17/2024 3:42 PM, Greg Kroah-Hartman wrote:
> > On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
> > > Modify device_shutdown() so that supplier devices do not wait for
> > > consumer devices to be shut down first when the devlink is sync state
> > > only, since the consumer is not dependent on the supplier in this case.
> > > 
> > > Without this change, a circular dependency could hang the system.
> > > 
> > > Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> > 
> > What commit id does this fix?  Should it go to stable?
> > 
> > And what driver is causing this problem, is this a regression or for
> > something new that just got added to the tree?
> > 
> > thanks,
> > 
> > greg k-h
> 
> This fixes commit 8064952c65045f05ee2671fe437770e50c151776, in
> driver-core-next & linux-next... it's problem with code that was just
> added to the tree (in drivers/base/core.c).  It is not in stable.

Ah, that wasn't obvious, sorry.

> Apologies, I should have mentioned that from the start.

Can you resend this with a "Fixes:" tag in it so I can just take it that
way and not have to edit it by hand?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: fix async device shutdown hang
  2024-09-18  6:12     ` Greg Kroah-Hartman
@ 2024-09-18 21:07       ` Laurence Oberman
  0 siblings, 0 replies; 8+ messages in thread
From: Laurence Oberman @ 2024-09-18 21:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, stuart hayes
  Cc: linux-kernel, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	David Jeffery, Jeremy Allison, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka

On Wed, 2024-09-18 at 08:12 +0200, Greg Kroah-Hartman wrote:
> On Tue, Sep 17, 2024 at 07:20:41PM -0500, stuart hayes wrote:
> > 
> > 
> > On 9/17/2024 3:42 PM, Greg Kroah-Hartman wrote:
> > > On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
> > > > Modify device_shutdown() so that supplier devices do not wait
> > > > for
> > > > consumer devices to be shut down first when the devlink is sync
> > > > state
> > > > only, since the consumer is not dependent on the supplier in
> > > > this case.
> > > > 
> > > > Without this change, a circular dependency could hang the
> > > > system.
> > > > 
> > > > Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> > > 
> > > What commit id does this fix?  Should it go to stable?
> > > 
> > > And what driver is causing this problem, is this a regression or
> > > for
> > > something new that just got added to the tree?
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > This fixes commit 8064952c65045f05ee2671fe437770e50c151776, in
> > driver-core-next & linux-next... it's problem with code that was
> > just
> > added to the tree (in drivers/base/core.c).  It is not in stable.
> 
> Ah, that wasn't obvious, sorry.
> 
> > Apologies, I should have mentioned that from the start.
> 
> Can you resend this with a "Fixes:" tag in it so I can just take it
> that
> way and not have to edit it by hand?
> 
> thanks,
> 
> greg k-h
> 

FYI 
This patch plus the rest of the original Bundle has been tested 
at Red Hat on a system with 24 nvme devices. 
Improvement was almost 8 times faster to shutdown.

Tested-by: Laurence Oberman <loberman@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] driver core: fix async device shutdown hang
@ 2024-09-19  4:31 Stuart Hayes
  2024-09-19  5:17 ` Greg Kroah-Hartman
  2024-09-19 14:16 ` Nathan Chancellor
  0 siblings, 2 replies; 8+ messages in thread
From: Stuart Hayes @ 2024-09-19  4:31 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Nathan Chancellor, Jan Kiszka
  Cc: Stuart Hayes

Modify device_shutdown() so that supplier devices do not wait for
consumer devices to be shut down first when the devlink is sync state
only, since the consumer is not dependent on the supplier in this case.

Without this change, a circular dependency could hang the system.

Fixes: 8064952c6504 ("driver core: shut down devices asynchronously")

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
The patch this fixes is in driver-core-next and linux-next.

Please let me know if this needs to be a V2 or if it needs anything
else... it is the identical patch I sent in yesterday, except I added
a "Fixes:" tag and the comments.  Thank you for the help!

 drivers/base/core.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b69b82da8837..76513e360496 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4898,8 +4898,16 @@ void device_shutdown(void)
 
 		idx = device_links_read_lock();
 		list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
-				device_links_read_lock_held())
+				device_links_read_lock_held()) {
+			/*
+			 * sync_state_only suppliers don't need to wait,
+			 * aren't reordered on devices_kset, so making them
+			 * wait could result in a hang
+			 */
+			if (device_link_flag_is_sync_state_only(link->flags))
+				continue;
 			link->supplier->p->shutdown_after = cookie;
+		}
 		device_links_read_unlock(idx);
 		put_device(dev);
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: fix async device shutdown hang
  2024-09-19  4:31 [PATCH] driver core: fix async device shutdown hang Stuart Hayes
@ 2024-09-19  5:17 ` Greg Kroah-Hartman
  2024-09-19 14:16 ` Nathan Chancellor
  1 sibling, 0 replies; 8+ messages in thread
From: Greg Kroah-Hartman @ 2024-09-19  5:17 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: linux-kernel, Rafael J . Wysocki, Martin Belanger,
	Oliver O'Halloran, Daniel Wagner, Keith Busch, Lukas Wunner,
	David Jeffery, Jeremy Allison, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme, Nathan Chancellor, Jan Kiszka

On Wed, Sep 18, 2024 at 11:31:43PM -0500, Stuart Hayes wrote:
> Modify device_shutdown() so that supplier devices do not wait for
> consumer devices to be shut down first when the devlink is sync state
> only, since the consumer is not dependent on the supplier in this case.
> 
> Without this change, a circular dependency could hang the system.
> 
> Fixes: 8064952c6504 ("driver core: shut down devices asynchronously")
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>

Nit, no blank line between Fixes: and signed-off-by is needed.

> ---
> The patch this fixes is in driver-core-next and linux-next.
> 
> Please let me know if this needs to be a V2 or if it needs anything
> else... it is the identical patch I sent in yesterday, except I added
> a "Fixes:" tag and the comments.  Thank you for the help!

In theory, yes, this is a v2 as something did change, but I can take
this as-is for now, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] driver core: fix async device shutdown hang
  2024-09-19  4:31 [PATCH] driver core: fix async device shutdown hang Stuart Hayes
  2024-09-19  5:17 ` Greg Kroah-Hartman
@ 2024-09-19 14:16 ` Nathan Chancellor
  1 sibling, 0 replies; 8+ messages in thread
From: Nathan Chancellor @ 2024-09-19 14:16 UTC (permalink / raw)
  To: Stuart Hayes
  Cc: linux-kernel, Greg Kroah-Hartman, Rafael J . Wysocki,
	Martin Belanger, Oliver O'Halloran, Daniel Wagner,
	Keith Busch, Lukas Wunner, David Jeffery, Jeremy Allison,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Jan Kiszka

On Wed, Sep 18, 2024 at 11:31:43PM -0500, Stuart Hayes wrote:
> Modify device_shutdown() so that supplier devices do not wait for
> consumer devices to be shut down first when the devlink is sync state
> only, since the consumer is not dependent on the supplier in this case.
> 
> Without this change, a circular dependency could hang the system.
> 
> Fixes: 8064952c6504 ("driver core: shut down devices asynchronously")
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>

All of my virtual testing seems happy with this change.

Tested-by: Nathan Chancellor <nathan@kernel.org>

> ---
> The patch this fixes is in driver-core-next and linux-next.
> 
> Please let me know if this needs to be a V2 or if it needs anything
> else... it is the identical patch I sent in yesterday, except I added
> a "Fixes:" tag and the comments.  Thank you for the help!
> 
>  drivers/base/core.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b69b82da8837..76513e360496 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4898,8 +4898,16 @@ void device_shutdown(void)
>  
>  		idx = device_links_read_lock();
>  		list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> -				device_links_read_lock_held())
> +				device_links_read_lock_held()) {
> +			/*
> +			 * sync_state_only suppliers don't need to wait,
> +			 * aren't reordered on devices_kset, so making them
> +			 * wait could result in a hang
> +			 */
> +			if (device_link_flag_is_sync_state_only(link->flags))
> +				continue;
>  			link->supplier->p->shutdown_after = cookie;
> +		}
>  		device_links_read_unlock(idx);
>  		put_device(dev);
>  
> -- 
> 2.39.3
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-19 14:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-19  4:31 [PATCH] driver core: fix async device shutdown hang Stuart Hayes
2024-09-19  5:17 ` Greg Kroah-Hartman
2024-09-19 14:16 ` Nathan Chancellor
  -- strict thread matches above, loose matches on Subject: below --
2024-09-17 20:15 Stuart Hayes
2024-09-17 20:42 ` Greg Kroah-Hartman
2024-09-18  0:20   ` stuart hayes
2024-09-18  6:12     ` Greg Kroah-Hartman
2024-09-18 21:07       ` Laurence Oberman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox