From: Ira Weiny <ira.weiny@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
"Lukas Wunner" <lukas@wunner.de>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
"Fabio M. De Francesco" <fabio.maria.de.francesco@intel.com>
Cc: <linux-cxl@vger.kernel.org>
Subject: Re: CXL related lockdep splats with 6.12-rc4
Date: Fri, 25 Oct 2024 11:22:13 -0500 [thread overview]
Message-ID: <671bc5b5b6213_1e4bd5294e9@iweiny-mobl.notmuch> (raw)
In-Reply-To: <671bc1ea774c7_1bbc6294fb@dwillia2-xfh.jf.intel.com.notmuch>
Dan Williams wrote:
> Ira Weiny wrote:
> > I was about to get cxl-fixes soaking last night and hit the following
> > lockdep splat.[1]
> >
> > It is intermittent, occurring about 3 times so far, while running all the
> > cxl-tests (nfit and cxl).
> >
> > I've been able to hit it with 6.12-rc4 __without__ the cxl fixes patches.
> >
> > So I'm thinking it is something in the device handling which has changed
> > or missed in rc1 testing. The intermittent nature (I can't even narrow
> > down which cxl-test test fails. :-/) is making this hard to track.
> >
> > It seems to hit during the firmware-update.sh test (which is not even a
> > direct cxl test.) But not always and may depend on a previous test
> > causing a lock state to trigger.
> >
> > I don't know if this has appeared because of a config change or what
> > because I have been testing since rc1. Config is in [2].
> >
> > I've also been able to hit what looks like a similar splat in [3]. But
> > I've not seen that reproduce.
> >
> > Any ideas on what might be happening would be appreciated.
>
> Going forward do look at using gist.github.com to share dumps.
yea. sorry.
>
> This is tripping over the online firmware activation unit test in
> nfit_test which is strictly an NVDIMM path. The fact that running that
> against the full CXL unit test finds this multi-stage lockdep splat is
> interesting but also not too surprising.
>
> This is part of the reason I only run:
>
> meson test -C build --suite cxl
Will do. But I've never had an issue before... :-/
>
> ...for CXL work, besides the long running NVDIMM tests that do not add
> much value to CXL regression.
>
> Online NVDIMM firmware activation handles this difficult side of effect
> of memory going offline in a way that could cause DMA timeouts and other
> problems. So the solution attempts to suspend all devices over the
> activation event. Given the violence of suspend some deployments choose
> to just live with the blip in memory response and hope nothing times
> out. So, I would say we should probably document
> "test/firmware-update.sh" as a low-value test and hope that CXL never
> needs to deal with devices going silent to memory cycles in problematic
> ways over firmware activation events.
Yep I finally got a good reproducer by running firmware-update.sh followed
by 'modprobe -r cxl-test'.
Thanks,
Ira
prev parent reply other threads:[~2024-10-25 16:22 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 15:15 CXL related lockdep splats with 6.12-rc4 Ira Weiny
2024-10-25 16:06 ` Dan Williams
2024-10-25 16:22 ` Ira Weiny [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=671bc5b5b6213_1e4bd5294e9@iweiny-mobl.notmuch \
--to=ira.weiny@intel.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=fabio.maria.de.francesco@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox