From: Herve Codina <herve.codina@bootlin.com>
To: Saravana Kannan <saravanak@google.com>
Cc: "Nuno Sá" <noname.nuno@gmail.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"Rob Herring" <robh+dt@kernel.org>,
"Frank Rowand" <frowand.list@gmail.com>,
"Lizhi Hou" <lizhi.hou@amd.com>, "Max Zhen" <max.zhen@amd.com>,
"Sonal Santan" <sonal.santan@amd.com>,
"Stefano Stabellini" <stefano.stabellini@xilinx.com>,
"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
"Allan Nielsen" <allan.nielsen@microchip.com>,
"Horatiu Vultur" <horatiu.vultur@microchip.com>,
"Steen Hegelund" <steen.hegelund@microchip.com>,
"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
"Android Kernel Team" <kernel-team@android.com>
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()
Date: Fri, 23 Feb 2024 09:46:28 +0100 [thread overview]
Message-ID: <20240223094628.340ad536@bootlin.com> (raw)
In-Reply-To: <CAGETcx9h4=k9XW+jZCw9zcVZruNZLPDQDt_sNZYXc05eQ2_uWQ@mail.gmail.com>
Hi,
On Thu, 22 Feb 2024 17:08:28 -0800
Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Feb 20, 2024 at 10:56 PM Nuno Sá <noname.nuno@gmail.com> wrote:
> >
> > On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote:
> > > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <herve.codina@bootlin.com> wrote:
> > > >
> > > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > > introduces a workqueue to release the consumer and supplier devices used
> > > > in the devlink.
> > > > In the job queued, devices are release and in turn, when all the
> > > > references to these devices are dropped, the release function of the
> > > > device itself is called.
> > > >
> > > > Nothing is present to provide some synchronisation with this workqueue
> > > > in order to ensure that all ongoing releasing operations are done and
> > > > so, some other operations can be started safely.
> > > >
> > > > For instance, in the following sequence:
> > > > 1) of_platform_depopulate()
> > > > 2) of_overlay_remove()
> > > >
> > > > During the step 1, devices are released and related devlinks are removed
> > > > (jobs pushed in the workqueue).
> > > > During the step 2, OF nodes are destroyed but, without any
> > > > synchronisation with devlink removal jobs, of_overlay_remove() can raise
> > > > warnings related to missing of_node_put():
> > > > ERROR: memory leak, expected refcount 1 instead of 2
> > > >
> > > > Indeed, the missing of_node_put() call is going to be done, too late,
> > > > from the workqueue job execution.
> > > >
> > > > Introduce device_link_wait_removal() to offer a way to synchronize
> > > > operations waiting for the end of devlink removals (i.e. end of
> > > > workqueue jobs).
> > > > Also, as a flushing operation is done on the workqueue, the workqueue
> > > > used is moved from a system-wide workqueue to a local one.
> > >
> > > Thanks for the bug report and fix. Sorry again about the delay in
> > > reviewing the changes.
> > >
> > > Please add Fixes tag for 80dd33cf72d1.
> > >
> > > > Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> > > > ---
> > > > drivers/base/core.c | 26 +++++++++++++++++++++++---
> > > > include/linux/device.h | 1 +
> > > > 2 files changed, 24 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > index ac026187ac6a..2e102a77758c 100644
> > > > --- a/drivers/base/core.c
> > > > +++ b/drivers/base/core.c
> > > > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> > > > static void __fw_devlink_link_to_consumers(struct device *dev);
> > > > static bool fw_devlink_drv_reg_done;
> > > > static bool fw_devlink_best_effort;
> > > > +static struct workqueue_struct *fw_devlink_wq;
> > > >
> > > > /**
> > > > * __fwnode_link_add - Create a link between two fwnode_handles.
> > > > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device *dev)
> > > > /*
> > > > * It may take a while to complete this work because of the SRCU
> > > > * synchronization in device_link_release_fn() and if the consumer or
> > > > - * supplier devices get deleted when it runs, so put it into the "long"
> > > > - * workqueue.
> > > > + * supplier devices get deleted when it runs, so put it into the
> > > > + * dedicated workqueue.
> > > > */
> > > > - queue_work(system_long_wq, &link->rm_work);
> > > > + queue_work(fw_devlink_wq, &link->rm_work);
> > >
> > > This has nothing to do with fw_devlink. fw_devlink is just triggering
> > > the issue in device links. You can hit this bug without fw_devlink too.
> > > So call this device_link_wq since it's consistent with device_link_* APIs.
> > >
> >
> > I'm not sure if I got this right in my series. I do call devlink_release_queue() to
> > my queue. But on the Overlay side I use fwnode_links_flush_queue() because it looked
> > more sensible from an OF point of view. And including (in OF code) linux/fwnode.h
> > instead linux/device.h makes more sense to me.
> >
> > > > }
> > > >
> > > > +/**
> > > > + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
> > > > + */
> > > > +void device_link_wait_removal(void)
> > > > +{
> > > > + /*
> > > > + * devlink removal jobs are queued in the dedicated work queue.
> > > > + * To be sure that all removal jobs are terminated, ensure that any
> > > > + * scheduled work has run to completion.
> > > > + */
> > > > + drain_workqueue(fw_devlink_wq);
> > >
> > > Is there a reason this needs to be drain_workqueu() instead of
> > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > case. All we are trying to make sure is that all the device link
> > > remove work queued so far have completed.
> > >
> >
> > Yeah, I'm also using flush_workqueue().
> >
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> > > > +
> > > > static struct class devlink_class = {
> > > > .name = "devlink",
> > > > .dev_groups = devlink_groups,
> > > > @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> > > > sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
> > > > if (!sysfs_dev_char_kobj)
> > > > goto char_kobj_err;
> > > > + fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> > > > + if (!fw_devlink_wq)
> > >
> > > Fix the name appropriately here too please.
> >
> > Hi Saravana,
> >
> > Oh, was not aware of this series... Please look at my first patch. It already has a
> > review tag by Rafael. I think the creation of the queue makes more sense to be done
> > in devlink_class_init(). Moreover, Rafael complained in my first version that
> > erroring out because we failed to create the queue is too harsh since devlinks can
> > still work.
>
> I think Rafael can be convinced on this one. Firstly, if we fail to
> allocate so early, we have bigger problems.
>
> > So, what we do is to schedule the work if we have a queue or too call
> > device_link_release_fn() synchronously if we don't have the queue (note that failing
> > to allocate the queue is very unlikely anyways).
>
> device links don't really work when you synchronously need to delete a
> link since it always uses SRCUs (it used to have a #ifndef CONFIG_SRCU
> locking). That's like saying a code still works when it doesn't hit a
> deadlock condition.
>
> Let's stick with Herve's patch series since he send it first and it
> has fewer things that need to be fixed. If he ignores this thread for
> too long, you can send a revision of yours again and we can accept
> that.
I don't ignore the thread :)
Hope I could take some time in the near future to send a v2 of this
series.
Hervé
next prev parent reply other threads:[~2024-02-23 8:46 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 17:41 [PATCH 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
2023-11-30 17:41 ` [PATCH 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
2024-02-21 0:31 ` Saravana Kannan
2024-02-21 6:56 ` Nuno Sá
2024-02-23 1:08 ` Saravana Kannan
2024-02-23 8:13 ` Nuno Sá
2024-02-23 8:46 ` Herve Codina [this message]
2024-02-23 8:56 ` Nuno Sá
2024-02-23 9:11 ` Herve Codina
2024-02-23 10:45 ` Nuno Sá
2024-02-29 23:26 ` Saravana Kannan
2024-03-01 7:14 ` Nuno Sá
2023-11-30 17:41 ` [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
2024-02-21 0:37 ` Saravana Kannan
2024-02-21 7:03 ` Nuno Sá
2024-02-23 9:45 ` Herve Codina
2024-02-23 10:35 ` Nuno Sá
2024-02-27 15:24 ` Herve Codina
2024-02-27 16:55 ` Nuno Sá
2024-02-27 17:54 ` Herve Codina
2024-02-27 19:07 ` Nuno Sá
2024-02-27 19:13 ` Rafael J. Wysocki
2024-02-27 19:28 ` Nuno Sá
2023-12-06 17:15 ` [PATCH 0/2] Synchronize DT overlay removal with " Rob Herring
2023-12-07 3:09 ` Saravana Kannan
2023-12-20 17:16 ` Luca Ceresoli
2023-12-20 18:12 ` Herve Codina
2024-02-21 0:19 ` Saravana Kannan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240223094628.340ad536@bootlin.com \
--to=herve.codina@bootlin.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=allan.nielsen@microchip.com \
--cc=devicetree@vger.kernel.org \
--cc=frowand.list@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=horatiu.vultur@microchip.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizhi.hou@amd.com \
--cc=max.zhen@amd.com \
--cc=noname.nuno@gmail.com \
--cc=rafael@kernel.org \
--cc=robh+dt@kernel.org \
--cc=saravanak@google.com \
--cc=sonal.santan@amd.com \
--cc=steen.hegelund@microchip.com \
--cc=stefano.stabellini@xilinx.com \
--cc=thomas.petazzoni@bootlin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).