All of lore.kernel.org
 help / color / mirror / Atom feed
From: Herve Codina <herve.codina@bootlin.com>
To: Saravana Kannan <saravanak@google.com>,
	Luca Ceresoli <luca.ceresoli@bootlin.com>,
	Nuno Sa <nuno.sa@analog.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Rob Herring <robh+dt@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Lizhi Hou <lizhi.hou@amd.com>, Max Zhen <max.zhen@amd.com>,
	Sonal Santan <sonal.santan@amd.com>,
	Stefano Stabellini <stefano.stabellini@xilinx.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	Allan Nielsen <allan.nielsen@microchip.com>,
	Horatiu Vultur <horatiu.vultur@microchip.com>,
	Steen Hegelund <steen.hegelund@microchip.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Android Kernel Team <kernel-team@android.com>
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
Date: Tue, 27 Feb 2024 16:24:22 +0100	[thread overview]
Message-ID: <20240227162422.76a00f11@bootlin.com> (raw)
In-Reply-To: <CAGETcx_zB95nyTpi-_kYW_VqnPqMEc8mS9sewSwRNVr0x=7+kA@mail.gmail.com>

Hi Saravana, Luca, Nuno,

On Tue, 20 Feb 2024 16:37:05 -0800
Saravana Kannan <saravanak@google.com> wrote:

...

> >
> > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > index a9a292d6d59b..5c5f808b163e 100644
> > --- a/drivers/of/overlay.c
> > +++ b/drivers/of/overlay.c
> > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> >                 goto out;
> >         }
> >
> > +       /*
> > +        * Wait for any ongoing device link removals before removing some of
> > +        * nodes
> > +        */
> > +       device_link_wait_removal();
> > +  
> 
> Nuno in his patch[1] had this "wait" happen inside
> __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> the issue that Luca reported[2] in this patch series. Is there any
> problem with doing that?
> 
> Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> I don't think that's necessary.

I think the unlock/lock in Luca's case and so in Nuno's case is needed.

I do the device_link_wait_removal() wihout having the of_mutex locked.

Now, suppose I do the device_link_wait_removal() call with the of_mutex locked.
The following flow is allowed and a deadlock is present.

of_overlay_remove()
  lock(of_mutex)
     device_link_wait_removal()

And, from the workqueue jobs execution:
  ...
    device_put()
      some_driver->remove()
        of_overlay_remove() <--- The job will never end.
                                 It is waiting for of_mutex.
                                 Deadlock

A call to of_overlay_remove() from a driver remove() function is perfectly
legit. A driver can use some overlays and it is already supported.
For instance:
  https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946

Unlocking/locking the mutex for the device_link_wait_removal() call opens
a window with the mutex unlocked.

What are the consequences of this mutex unlocked window during this
of_overlay_remove() call?

> 
> Can you move this call to where Nuno did it and see if that works for
> all of you?
> 
> [1] - https://lore.kernel.org/all/20240205-fix-device-links-overlays-v2-2-5344f8c79d57@analog.com/
> [2] - https://lore.kernel.org/all/20231220181627.341e8789@booty/
> 

If the unlock/lock can be done, I plan to unlock/call/lock in the beginning
of free_overlay_changeset():
--- 8< ---
@@ -853,6 +854,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs)
 {
        int i;
 
+       /*
+        * Wait for any ongoing device link removals before removing some of
+        * nodes.
+        */
+       mutex_unlock(&of_mutex);
+       device_link_wait_removal();
+       mutex_lock(&of_mutex);
+
        if (ovcs->cset.entries.next)
                of_changeset_destroy(&ovcs->cset);
--- 8< ---

I prefer that location (drivers/of/overlay.c) instead of Nuno's one because
of the unlock/call/lock need.
Nuno's call is done in __of_changeset_entry_destroy() (drivers/of/dynamic.c)
IMHO, I think it is easier to maintain with this lock, unlock/call/lock,
unlock sequence in the same file (i.e. drivers/of/overlay.c).

Didn't test yet this modification as I need to setup one of my boards in the
right context to reproduce the issue on my side.

Also, I need to take into account some other comments received.

Best regards,
Hervé

  parent reply	other threads:[~2024-02-27 15:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 17:41 [PATCH 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
2023-11-30 17:41 ` [PATCH 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
2024-02-21  0:31   ` Saravana Kannan
2024-02-21  6:56     ` Nuno Sá
2024-02-23  1:08       ` Saravana Kannan
2024-02-23  8:13         ` Nuno Sá
2024-02-23  8:46         ` Herve Codina
2024-02-23  8:56           ` Nuno Sá
2024-02-23  9:11     ` Herve Codina
2024-02-23 10:45       ` Nuno Sá
2024-02-29 23:26         ` Saravana Kannan
2024-03-01  7:14           ` Nuno Sá
2023-11-30 17:41 ` [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
2024-02-21  0:37   ` Saravana Kannan
2024-02-21  7:03     ` Nuno Sá
2024-02-23  9:45     ` Herve Codina
2024-02-23 10:35       ` Nuno Sá
2024-02-27 15:24     ` Herve Codina [this message]
2024-02-27 16:55       ` Nuno Sá
2024-02-27 17:54         ` Herve Codina
2024-02-27 19:07           ` Nuno Sá
2024-02-27 19:13             ` Rafael J. Wysocki
2024-02-27 19:28               ` Nuno Sá
2023-12-06 17:15 ` [PATCH 0/2] Synchronize DT overlay removal with " Rob Herring
2023-12-07  3:09   ` Saravana Kannan
2023-12-20 17:16     ` Luca Ceresoli
2023-12-20 18:12       ` Herve Codina
2024-02-21  0:19     ` Saravana Kannan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240227162422.76a00f11@bootlin.com \
    --to=herve.codina@bootlin.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=allan.nielsen@microchip.com \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=horatiu.vultur@microchip.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizhi.hou@amd.com \
    --cc=luca.ceresoli@bootlin.com \
    --cc=max.zhen@amd.com \
    --cc=nuno.sa@analog.com \
    --cc=rafael@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=saravanak@google.com \
    --cc=sonal.santan@amd.com \
    --cc=steen.hegelund@microchip.com \
    --cc=stefano.stabellini@xilinx.com \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.