Linux CXL
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	"Lukas Wunner" <lukas@wunner.de>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Fabio M. De Francesco" <fabio.maria.de.francesco@intel.com>
Cc: <linux-cxl@vger.kernel.org>
Subject: Re: CXL related lockdep splats with 6.12-rc4
Date: Fri, 25 Oct 2024 09:06:02 -0700	[thread overview]
Message-ID: <671bc1ea774c7_1bbc6294fb@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <671bb6217b2b1_1b7aea2942f@iweiny-mobl.notmuch>

Ira Weiny wrote:
> I was about to get cxl-fixes soaking last night and hit the following
> lockdep splat.[1]
> 
> It is intermittent, occurring about 3 times so far, while running all the
> cxl-tests (nfit and cxl).
> 
> I've been able to hit it with 6.12-rc4 __without__ the cxl fixes patches.
> 
> So I'm thinking it is something in the device handling which has changed
> or missed in rc1 testing.  The intermittent nature (I can't even narrow
> down which cxl-test test fails.  :-/) is making this hard to track.
> 
> It seems to hit during the firmware-update.sh test (which is not even a
> direct cxl test.)  But not always and may depend on a previous test
> causing a lock state to trigger.
> 
> I don't know if this has appeared because of a config change or what
> because I have been testing since rc1.  Config is in [2].
> 
> I've also been able to hit what looks like a similar splat in [3].  But
> I've not seen that reproduce.
> 
> Any ideas on what might be happening would be appreciated.

Going forward do look at using gist.github.com to share dumps.

This is tripping over the online firmware activation unit test in
nfit_test which is strictly an NVDIMM path. The fact that running that
against the full CXL unit test finds this multi-stage lockdep splat is
interesting but also not too surprising.

This is part of the reason I only run:

    meson test -C build --suite cxl

...for CXL work, besides the long running NVDIMM tests that do not add
much value to CXL regression.

Online NVDIMM firmware activation handles this difficult side of effect
of memory going offline in a way that could cause DMA timeouts and other
problems. So the solution attempts to suspend all devices over the
activation event. Given the violence of suspend some deployments choose
to just live with the blip in memory response and hope nothing times
out. So, I would say we should probably document
"test/firmware-update.sh" as a low-value test and hope that CXL never
needs to deal with devices going silent to memory cycles in problematic
ways over firmware activation events.

  reply	other threads:[~2024-10-25 16:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 15:15 CXL related lockdep splats with 6.12-rc4 Ira Weiny
2024-10-25 16:06 ` Dan Williams [this message]
2024-10-25 16:22   ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=671bc1ea774c7_1bbc6294fb@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=fabio.maria.de.francesco@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox