devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nuno Sá" <noname.nuno@gmail.com>
To: Rob Herring <robh@kernel.org>
Cc: Herve Codina <herve.codina@bootlin.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Lizhi Hou <lizhi.hou@amd.com>, Max Zhen <max.zhen@amd.com>,
	 Sonal Santan <sonal.santan@amd.com>,
	Stefano Stabellini <stefano.stabellini@xilinx.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-kernel@vger.kernel.org,  devicetree@vger.kernel.org,
	Allan Nielsen <allan.nielsen@microchip.com>,
	 Horatiu Vultur <horatiu.vultur@microchip.com>,
	Steen Hegelund <steen.hegelund@microchip.com>,
	Luca Ceresoli <luca.ceresoli@bootlin.com>,
	Nuno Sa <nuno.sa@analog.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
Date: Mon, 04 Mar 2024 16:36:15 +0100	[thread overview]
Message-ID: <93d0adf0ee6f3737af4d482dc206fe152f762482.camel@gmail.com> (raw)
In-Reply-To: <20240304152202.GA222088-robh@kernel.org>

On Mon, 2024-03-04 at 09:22 -0600, Rob Herring wrote:
> On Thu, Feb 29, 2024 at 12:18:49PM +0100, Nuno Sá wrote:
> > On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote:
> > > In the following sequence:
> > >   1) of_platform_depopulate()
> > >   2) of_overlay_remove()
> > > 
> > > During the step 1, devices are destroyed and devlinks are removed.
> > > During the step 2, OF nodes are destroyed but
> > > __of_changeset_entry_destroy() can raise warnings related to missing
> > > of_node_put():
> > >   ERROR: memory leak, expected refcount 1 instead of 2 ...
> > > 
> > > Indeed, during the devlink removals performed at step 1, the removal
> > > itself releasing the device (and the attached of_node) is done by a job
> > > queued in a workqueue and so, it is done asynchronously with respect to
> > > function calls.
> > > When the warning is present, of_node_put() will be called but wrongly
> > > too late from the workqueue job.
> > > 
> > > In order to be sure that any ongoing devlink removals are done before
> > > the of_node destruction, synchronize the of_overlay_remove() with the
> > > devlink removals.
> > > 
> > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> > > ---
> > >  drivers/of/overlay.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > index 2ae7e9d24a64..7a010a62b9d8 100644
> > > --- a/drivers/of/overlay.c
> > > +++ b/drivers/of/overlay.c
> > > @@ -8,6 +8,7 @@
> > >  
> > >  #define pr_fmt(fmt)	"OF: overlay: " fmt
> > >  
> > > +#include <linux/device.h>
> > 
> > This is clearly up to the DT maintainers to decide but, IMHO, I would very
> > much
> > prefer to see fwnode.h included in here rather than directly device.h (so
> > yeah,
> > renaming the function to fwnode_*).
> 
> IMO, the DT code should know almost nothing about fwnode because that's 
> the layer above it. But then overlay stuff is kind of a layer above the 
> core DT code too.

Yeah, my reasoning is just that it may be better than knowing about device.h
code... But maybe I'm wrong :)

> 
> > But yeah, I might be biased by own series :)
> > 
> > >  #include <linux/kernel.h>
> > >  #include <linux/module.h>
> > >  #include <linux/of.h>
> > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct
> > > overlay_changeset *ovcs)
> > >  {
> > >  	int i;
> > >  
> > > +	/*
> > > +	 * Wait for any ongoing device link removals before removing some
> > > of
> > > +	 * nodes. Drop the global lock while waiting
> > > +	 */
> > > +	mutex_unlock(&of_mutex);
> > > +	device_link_wait_removal();
> > > +	mutex_lock(&of_mutex);
> > 
> > I'm still not convinced we need to drop the lock. What happens if someone
> > else
> > grabs the lock while we are in device_link_wait_removal()? Can we guarantee
> > that
> > we can't screw things badly?
> 
> It is also just ugly because it's the callers of 
> free_overlay_changeset() that hold the lock and now we're releasing it 
> behind their back.
> 
> As device_link_wait_removal() is called before we touch anything, can't 
> it be called before we take the lock? And do we need to call it if 
> applying the overlay fails?
> 

My natural feeling was to put it right before checking the node refcount... and
I would like to still see proof that there's any potential deadlock. I did not
checked the code but the issue with calling it before we take the lock is that
likely the device links wont be removed because the overlay removal path (which
unbinds devices from drivers) needs to run under the lock?

- Nuno Sá

  reply	other threads:[~2024-03-04 15:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 10:52 [PATCH v3 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
2024-02-29 10:52 ` [PATCH v3 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
2024-02-29 11:16   ` Nuno Sá
2024-02-29 12:57     ` Rafael J. Wysocki
2024-02-29 13:01     ` Rafael J. Wysocki
2024-02-29 13:06       ` Nuno Sá
2024-02-29 13:10         ` Rafael J. Wysocki
2024-02-29 14:00           ` Herve Codina
2024-02-29 10:52 ` [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
2024-02-29 11:18   ` Nuno Sá
2024-03-04 15:22     ` Rob Herring
2024-03-04 15:36       ` Nuno Sá [this message]
2024-03-04 16:49       ` Herve Codina
2024-03-05  6:47         ` Saravana Kannan
2024-03-05  7:36           ` Nuno Sá
2024-03-05 10:27             ` Herve Codina
2024-03-05 10:47               ` Nuno Sá
2024-03-06  2:59                 ` Saravana Kannan
2024-03-04 15:02 ` [PATCH v3 0/2] Synchronize DT overlay removal with " Rob Herring
2024-03-05  6:17   ` Saravana Kannan
2024-03-05  7:17     ` Nuno Sá

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93d0adf0ee6f3737af4d482dc206fe152f762482.camel@gmail.com \
    --to=noname.nuno@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=allan.nielsen@microchip.com \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herve.codina@bootlin.com \
    --cc=horatiu.vultur@microchip.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizhi.hou@amd.com \
    --cc=luca.ceresoli@bootlin.com \
    --cc=max.zhen@amd.com \
    --cc=nuno.sa@analog.com \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    --cc=sonal.santan@amd.com \
    --cc=stable@vger.kernel.org \
    --cc=steen.hegelund@microchip.com \
    --cc=stefano.stabellini@xilinx.com \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).