Linux CXL
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	"Lukas Wunner" <lukas@wunner.de>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Fabio M. De Francesco" <fabio.maria.de.francesco@intel.com>
Cc: <linux-cxl@vger.kernel.org>
Subject: Re: CXL related lockdep splats with 6.12-rc4
Date: Fri, 25 Oct 2024 11:22:13 -0500	[thread overview]
Message-ID: <671bc5b5b6213_1e4bd5294e9@iweiny-mobl.notmuch> (raw)
In-Reply-To: <671bc1ea774c7_1bbc6294fb@dwillia2-xfh.jf.intel.com.notmuch>

Dan Williams wrote:
> Ira Weiny wrote:
> > I was about to get cxl-fixes soaking last night and hit the following
> > lockdep splat.[1]
> > 
> > It is intermittent, occurring about 3 times so far, while running all the
> > cxl-tests (nfit and cxl).
> > 
> > I've been able to hit it with 6.12-rc4 __without__ the cxl fixes patches.
> > 
> > So I'm thinking it is something in the device handling which has changed
> > or missed in rc1 testing.  The intermittent nature (I can't even narrow
> > down which cxl-test test fails.  :-/) is making this hard to track.
> > 
> > It seems to hit during the firmware-update.sh test (which is not even a
> > direct cxl test.)  But not always and may depend on a previous test
> > causing a lock state to trigger.
> > 
> > I don't know if this has appeared because of a config change or what
> > because I have been testing since rc1.  Config is in [2].
> > 
> > I've also been able to hit what looks like a similar splat in [3].  But
> > I've not seen that reproduce.
> > 
> > Any ideas on what might be happening would be appreciated.
> 
> Going forward do look at using gist.github.com to share dumps.

yea.  sorry.

> 
> This is tripping over the online firmware activation unit test in
> nfit_test which is strictly an NVDIMM path. The fact that running that
> against the full CXL unit test finds this multi-stage lockdep splat is
> interesting but also not too surprising.
> 
> This is part of the reason I only run:
> 
>     meson test -C build --suite cxl

Will do.  But I've never had an issue before...  :-/

> 
> ...for CXL work, besides the long running NVDIMM tests that do not add
> much value to CXL regression.
> 
> Online NVDIMM firmware activation handles this difficult side of effect
> of memory going offline in a way that could cause DMA timeouts and other
> problems. So the solution attempts to suspend all devices over the
> activation event. Given the violence of suspend some deployments choose
> to just live with the blip in memory response and hope nothing times
> out. So, I would say we should probably document
> "test/firmware-update.sh" as a low-value test and hope that CXL never
> needs to deal with devices going silent to memory cycles in problematic
> ways over firmware activation events.


Yep I finally got a good reproducer by running firmware-update.sh followed
by 'modprobe -r cxl-test'.

Thanks,
Ira

      reply	other threads:[~2024-10-25 16:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 15:15 CXL related lockdep splats with 6.12-rc4 Ira Weiny
2024-10-25 16:06 ` Dan Williams
2024-10-25 16:22   ` Ira Weiny [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=671bc5b5b6213_1e4bd5294e9@iweiny-mobl.notmuch \
    --to=ira.weiny@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=fabio.maria.de.francesco@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox