All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luca Ceresoli <luca.ceresoli@bootlin.com>
To: Saravana Kannan <saravanak@google.com>
Cc: Rob Herring <robh@kernel.org>,
	Herve Codina <herve.codina@bootlin.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Lizhi Hou <lizhi.hou@amd.com>, Max Zhen <max.zhen@amd.com>,
	Sonal Santan <sonal.santan@amd.com>,
	Stefano Stabellini <stefano.stabellini@xilinx.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	Allan Nielsen <allan.nielsen@microchip.com>,
	Horatiu Vultur <horatiu.vultur@microchip.com>,
	Steen Hegelund <steen.hegelund@microchip.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Miquel Raynal <miquel.raynal@bootlin.com>
Subject: Re: [PATCH 0/2] Synchronize DT overlay removal with devlink removals
Date: Wed, 20 Dec 2023 18:16:27 +0100	[thread overview]
Message-ID: <20231220181627.341e8789@booty> (raw)
In-Reply-To: <CAGETcx-F8G3dcN-VTMrbya_=19zXP=S2ORA_qZqy+yND7S41_Q@mail.gmail.com>

Hello Saravana, Rob, Hervé,

[+Miquèl, who contributed to the discussion with Hervé and me]

On Wed, 6 Dec 2023 19:09:06 -0800
Saravana Kannan <saravanak@google.com> wrote:

> On Wed, Dec 6, 2023 at 9:15 AM Rob Herring <robh@kernel.org> wrote:
> >
> > On Thu, Nov 30, 2023 at 06:41:07PM +0100, Herve Codina wrote:  
> > > Hi,  
> >
> > +Saravana for comment  
> 
> I'll respond to this within a week -- very swamped at the moment. The
> main thing I want to make sure is that we don't cause an indirect
> deadlock with this wait(). I'll go back and look at why we added the
> work queue and then check for device/devlink locking issues.

While working on a project unrelated to Hervé's work, I also ended up
in getting sporadic but frequent "ERROR: memory leak, expected refcount
1 instead of..." messages, which persisted even after adding this patch
series on my tree.

My use case is the insertion and removal of a simple overlay describing
a regulator-fixed and an I2C GPIO expander using it. The messages appear
regardless of whether the insertion and removal is done from kernel code
or via the configfs interface (out-of-tree patches from [0]).

I reconstructed the sequence of operations, all of which stem from
of_overlay_remove():

int of_overlay_remove(int *ovcs_id)
{
    ...

    device_link_wait_removal(); // proposed by this patch series

    mutex_lock(&of_mutex);

    ...

    ret = __of_changeset_revert_notify(&ovcs->cset);
    // this ends up calling (excerpt from a long stack trace):
    // -> of_i2c_notify
    // -> device_remove
    // -> devm_regulator_release
    // -> device_link_remove
    // -> devlink_dev_release, which queues work for
    //      device_link_release_fn, which in turn calls:
    //      -> device_put
    //      -> device_release
    //      -> {platform,regulator,...}_dev*_release
    //      -> of_node_put() [**]

    ...

    free_overlay_changeset(ovcs);
    // calls:
    // -> of_changeset_destroy
    // -> __of_changeset_entry_destroy
    // -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d...
    // The error appears or not, based on when the workqueue runs

err_unlock:
    mutex_unlock(&of_mutex);

    ...
}

So this adds up to the question of whether devlink removal should actually
be run asynchronously or not.

A simple short-term solution is to move the call to
device_link_wait_removal() later, just before free_overlay_changeset():


diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 1a8a6620748c..eccf08cf2160 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -1375,12 +1375,6 @@ int of_overlay_remove(int *ovcs_id)
 		goto out;
 	}
 
-	/*
-	 * Wait for any ongoing device link removals before removing some of
-	 * nodes
-	 */
-	device_link_wait_removal();
-
 	mutex_lock(&of_mutex);
 
 	ovcs = idr_find(&ovcs_idr, *ovcs_id);
@@ -1427,6 +1421,14 @@ int of_overlay_remove(int *ovcs_id)
 		if (!ret)
 			ret = ret_tmp;
 
+	/*
+	 * Wait for any ongoing device link removals before removing some of
+	 * nodes
+	 */
+	mutex_unlock(&of_mutex);
+	device_link_wait_removal();
+	mutex_lock(&of_mutex);
+
 	free_overlay_changeset(ovcs);
 
 err_unlock:


This obviously raises the question of whether unlocking and re-locking
the mutex is potentially dangerous. I have no answer to this right away,
but I tested this change with CONFIG_PROVE_LOCKING=y and no issue showed
up after several overlay load/unload sequences so I am not aware of any
actual issues with this change.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/log/?h=topic/overlays

Luca
-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

  reply	other threads:[~2023-12-20 17:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 17:41 [PATCH 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
2023-11-30 17:41 ` [PATCH 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
2024-02-21  0:31   ` Saravana Kannan
2024-02-21  6:56     ` Nuno Sá
2024-02-23  1:08       ` Saravana Kannan
2024-02-23  8:13         ` Nuno Sá
2024-02-23  8:46         ` Herve Codina
2024-02-23  8:56           ` Nuno Sá
2024-02-23  9:11     ` Herve Codina
2024-02-23 10:45       ` Nuno Sá
2024-02-29 23:26         ` Saravana Kannan
2024-03-01  7:14           ` Nuno Sá
2023-11-30 17:41 ` [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
2024-02-21  0:37   ` Saravana Kannan
2024-02-21  7:03     ` Nuno Sá
2024-02-23  9:45     ` Herve Codina
2024-02-23 10:35       ` Nuno Sá
2024-02-27 15:24     ` Herve Codina
2024-02-27 16:55       ` Nuno Sá
2024-02-27 17:54         ` Herve Codina
2024-02-27 19:07           ` Nuno Sá
2024-02-27 19:13             ` Rafael J. Wysocki
2024-02-27 19:28               ` Nuno Sá
2023-12-06 17:15 ` [PATCH 0/2] Synchronize DT overlay removal with " Rob Herring
2023-12-07  3:09   ` Saravana Kannan
2023-12-20 17:16     ` Luca Ceresoli [this message]
2023-12-20 18:12       ` Herve Codina
2024-02-21  0:19     ` Saravana Kannan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231220181627.341e8789@booty \
    --to=luca.ceresoli@bootlin.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=allan.nielsen@microchip.com \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herve.codina@bootlin.com \
    --cc=horatiu.vultur@microchip.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizhi.hou@amd.com \
    --cc=max.zhen@amd.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    --cc=saravanak@google.com \
    --cc=sonal.santan@amd.com \
    --cc=steen.hegelund@microchip.com \
    --cc=stefano.stabellini@xilinx.com \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.