RE: Questions about CXL device (type 3 memory) hotplug

Linux CXL
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: Vikram Sethi <vsethi@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"catalin.marinas@arm.com" <Catalin.Marinas@arm.com>,
	James Morse <james.morse@arm.com>
Cc: "Natu, Mahesh" <mahesh.natu@intel.com>
Subject: RE: Questions about CXL device (type 3 memory) hotplug
Date: Wed, 24 May 2023 14:20:23 -0700	[thread overview]
Message-ID: <646e7f96f33e2_33fb3294c1@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <BN8PR12MB3330831F2E666E9BB1319E66BD419@BN8PR12MB3330.namprd12.prod.outlook.com>

Vikram Sethi wrote:
[..]
> > I don't understand this failure mode. Accelerator is added, driver sets up an
> > HDM decode range and triggers CPU cache invalidation before mapping the
> > memory into page tables. Wouldn't the device, upon receiving an invalidation
> > request, just snoop its caches and say "nothing for me to do"?
> 
> Device's snoop filter is in a clean reset/power on state. It is not
> tracking anything checked out by the host CPU/peer.  If it starts
> receiving writebacks or even CleanEvicts for its memory, 

CleanEvict is a device-to-host request. We are talking about
host-to-device requests which is only SnpData, SnpInv, and SnpCur,
right?

> looks like an unexpected coherency message and i Know of at least one
> implementation that triggers an error interrupt in response. I don't
> know of a statement In the specification that this is expected and
> implementations should ignore. If there is such a statement, could you
> please point me to it? 

All the specification says (CXL 3.0 3.2.4.4 Host to Device Requests) is
what to do *if* the device is holding that cacheline.

If a device fails when it gets one of those requests when it does not
hold a line then how can this work in the nominal case of the device not
owning any random cacheline?

> Remove memory needs a cache flush IMO, in a way that prevents
> speculative fetches.  This can be done in kernel with uncacheable
> mappings alone, if possible in the arch callback, or via FW call. 

That assumes that the kernel owns all mappings. I worry about mappings
that the kernel cannot see like x86 SMM. That's why it's currently an
invalidate before next usage, but I am not opposed to also flushing on
remove if the current solution is causing device-failures in practice.

Can you confirm that the current kernel arrangement is causing failures
in practice, or is this a theoretical concern? ...and if it is happening
in practice do you have the example patch that fixes it?

next prev parent reply	other threads:[~2023-05-24 21:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22  8:06 Questions about CXL device (type 3 memory) hotplug Yasunori Gotou (Fujitsu)
2023-05-23  0:11 ` Dan Williams
2023-05-23  8:31   ` Yasunori Gotou (Fujitsu)
2023-05-23 17:36     ` Dan Williams
2023-05-24 11:12       ` Yasunori Gotou (Fujitsu)
2023-05-24 20:51         ` Dan Williams
2023-05-25 10:32           ` Yasunori Gotou (Fujitsu)
2023-05-26  8:05         ` Yasunori Gotou (Fujitsu)
2023-05-26 14:48           ` Dan Williams
2023-05-29  8:07             ` Yasunori Gotou (Fujitsu)
2023-06-06 17:58               ` Dan Williams
2023-06-08  7:39                 ` Yasunori Gotou (Fujitsu)
2023-06-08 18:37                   ` Dan Williams
2023-06-09  1:02                     ` Yasunori Gotou (Fujitsu)
2023-05-23 13:34   ` Vikram Sethi
2023-05-23 18:40     ` Dan Williams
2023-05-24  0:02       ` Vikram Sethi
2023-05-24  4:03         ` Dan Williams
2023-05-24 14:47           ` Vikram Sethi
2023-05-24 21:20             ` Dan Williams [this message]
2023-05-31  4:25               ` Vikram Sethi
2023-06-06 20:54                 ` Dan Williams
2023-06-07  1:06                   ` Vikram Sethi
2023-06-07 15:12                     ` Jonathan Cameron
2023-06-07 18:44                       ` Vikram Sethi
2023-06-08 15:19                         ` Jonathan Cameron
2023-06-08 18:41                           ` Dan Williams
2024-03-27  7:10   ` Yuquan Wang
2024-03-27  7:18   ` Yuquan Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=646e7f96f33e2_33fb3294c1@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=mahesh.natu@intel.com \
    --cc=vsethi@nvidia.com \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox