From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>, Ira Weiny <ira.weiny@intel.com>
Subject: Re: [PATCH] cxl/mem: Fix shutdown order
Date: Fri, 29 Sep 2023 10:03:16 +0100 [thread overview]
Message-ID: <20230929100316.00004546@Huawei.com> (raw)
In-Reply-To: <169596542307.790108.11339208844199665348.stgit@dwillia2-xfh.jf.intel.com>
On Thu, 28 Sep 2023 22:30:23 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> Ira reports that removing cxl_mock_mem causes a crash with the following
> trace:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000044
> [..]
> RIP: 0010:cxl_region_decode_reset+0x7f/0x180 [cxl_core]
> [..]
> Call Trace:
> <TASK>
> cxl_region_detach+0xe8/0x210 [cxl_core]
> cxl_decoder_kill_region+0x27/0x40 [cxl_core]
> cxld_unregister+0x29/0x40 [cxl_core]
> devres_release_all+0xb8/0x110
> device_unbind_cleanup+0xe/0x70
> device_release_driver_internal+0x1d2/0x210
> bus_remove_device+0xd7/0x150
> device_del+0x155/0x3e0
> device_unregister+0x13/0x60
> devm_release_action+0x4d/0x90
> ? __pfx_unregister_port+0x10/0x10 [cxl_core]
> delete_endpoint+0x121/0x130 [cxl_core]
> devres_release_all+0xb8/0x110
> device_unbind_cleanup+0xe/0x70
> device_release_driver_internal+0x1d2/0x210
> bus_remove_device+0xd7/0x150
> device_del+0x155/0x3e0
> ? lock_release+0x142/0x290
> cdev_device_del+0x15/0x50
> cxl_memdev_unregister+0x54/0x70 [cxl_core]
>
> This crash is due to the clearing out the cxl_memdev's driver context
> (@cxlds) before the driver is done with it. Fix it by keeping the driver
> context valid until device unregistration completes.
Ideally good to have an explicit statement of which bit of context
is used in the cdev_device_del() call.
>
> Fixes: 9cc238c7a526 ("cxl/pci: Introduce cdevm_file_operations")
> Reported-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
I'm not that keen on the fix, because it makes some incidental ordering
changes and the ordering in here is messy. What you have feels like
it's fixing on symptom whereas there are other issues here.
Firstly, as a side note given I'm looking at this
code cxl_memdev_security_shutdown() should perhaps take a
struct cxl_memdev rather than the struct device that it then
uses just to get hold of the cxl_memdev available in
cxl_memdev_shutdown(). The equivalent setup code
cxl_memdev_security_init() already does take a struct
cxl_memdev so this would be desirable for symmetry if nothing else.
However, cxl_memdev_security_shutdown() doesn't actually undo
the stuff in cxl_memdev_security_init() which previously sent me
on a wild goose chase.
Offloading that put_santize callback here to devm looks dubious to me:
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/cxl/core/memdev.c#L1031
.. for a function mid way through a bunch of stuff that is otherwise cleaned up
by cxl_memdev_unregister() via devm here
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/cxl/core/memdev.c#L1066
seems like a recipe for future pain. Could we just move the put_santize()
call directly into cxl_memdev_unregister() and the error path in
cxl_memdev_add_dev() and make sure the order is as expected.
I had a long argument written up on why cxl_memdev_security_shutdown()
was in the wrong place in the new sequence before seeing that it
didn't balance with what I assumed it did. So probably still in an illogical
place given I'd expect all the 'security stuff' to be initialized at same
point in setup and tear down sequences (reversed obviously.)
Thanks,
Jonathan
> ---
> drivers/cxl/core/memdev.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 14b547c07f54..92d40c5e7efb 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -580,8 +580,8 @@ static void cxl_memdev_unregister(void *_cxlmd)
> struct cxl_memdev *cxlmd = _cxlmd;
> struct device *dev = &cxlmd->dev;
>
> - cxl_memdev_shutdown(dev);
> cdev_device_del(&cxlmd->cdev, dev);
> + cxl_memdev_shutdown(dev);
> put_device(dev);
> }
>
>
next prev parent reply other threads:[~2023-09-29 9:03 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-29 5:30 [PATCH] cxl/mem: Fix shutdown order Dan Williams
2023-09-29 9:03 ` Jonathan Cameron [this message]
2023-09-29 16:27 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230929100316.00004546@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.