From: Ira Weiny <ira.weiny@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
Shiju Jose <shiju.jose@huawei.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>,
Yazen Ghannam <yazen.ghannam@amd.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ard Biesheuvel <ardb@kernel.org>, <linux-efi@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
Ira Weiny <ira.weiny@intel.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Tony Luck <tony.luck@intel.com>, "Borislav Petkov" <bp@alien8.de>
Subject: Re: [PATCH 2/4] acpi/ghes: Process CXL Component Events
Date: Fri, 1 Mar 2024 14:05:11 -0800 [thread overview]
Message-ID: <65e2511735a41_314b3329450@iweiny-mobl.notmuch> (raw)
In-Reply-To: <65e23fc5402c1_3651e29483@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Dan Williams wrote:
> Ira Weiny wrote:
> > BIOS can configure memory devices as firmware first. This will send CXL
> > events to the firmware instead of the OS. The firmware can then send
> > these events to the OS via UEFI. Currently a configuration such as this
> > will trace a non standard event in the log. Using the specific CXL
> > trace points with the additional information CXL can provide is much
^^^^^^^^^^^^^^^^^^^^^^^^
See below.
> > more useful to users.
> > Specifically, future support can be added to CXL
> > provide the DPA to HPA mapping configured at the time of the event.
>
> One could argue that support should have happened first and taken the
> event all the way to EDAC, so this needs to merged on faith that that
> those patches are in flight.
Sure but then you also don't get the additional information already
provided by the CXL trace points. Reading this back I see that my use of
'Specifically' masked the fact that the CXL code already has better
support to decode these records and we want to leverage that.
>
> > UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> > format for CXL Component Events. The format is mostly the same as the
> > CXL Common Event Record Format. The difference is the use of a GUID in
> > the Section Type rather than a UUID as part of the event itself.
> >
> > Add GHES support to detect CXL CPER records and call into the CXL code
> > to process the event.
> >
> > Multiple methods were considered for the call into the CXL code. A
> > notifier chain was considered for the callback but the complexity did
> > not justify the use case.
>
> Not sure what this adds. If you want to talk about alternatives and
> tradeoffs, great, but that should be a comparative analysis in support
> of the chosen direction.
I'll remove it. This was carried from the previous versions.
>
> > Furthermore, the CXL code is required to be called from process
> > context as it needs to take a device lock so a simple callback
> > register proved difficult. Dan Williams suggested using 2 work items
> > as an atomic way of switching between a callback being registered and
> > not. This allows the callback to run without any locking.[1]
> >
> > Note that a local work item is required to dump any messages seen during
> > a race between any check done in cxl_cper_post_event() and the
> > scheduling of work. That said, no attempt is made to stop the addition
> > of messages into the kfifo because this local work item provides a hook
> > to add a local CXL CPER trace point in a future patch.
> >
> > This new combined patch addresses the report by Dan Carpenter[2]. Thus
> > the reported by tag.
> >
> > [1] https://lore.kernel.org/all/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch/
> > [2] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> >
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
>
> checkpatch will whine about a missing Closes: tag.
I know. How about Suggested-by? I want to give Dan credit because your
revert already closed the actual bug.
[snip]
> > +
> > +DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, 32);
> > +static DEFINE_SPINLOCK(cxl_cper_read_lock);
> > +static DEFINE_SPINLOCK(cxl_cper_write_lock);
> > +
> > +static cxl_cper_callback cper_callback;
> > +/* cb function dumps the records */
> > +static void cxl_cper_cb_fn(struct work_struct *work)
> > +{
> > + struct cxl_cper_work_data wd;
> > +
> > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1,
> > + &cxl_cper_read_lock)) {
>
> Why is this taking the lock on retrieval? The work item is
> single-threaded. The only potential contention is between
> cxl_cper_local_fn() and cxl_cper_cb_fn() collision, but that can be
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because of this collision, Yes.
> handled by a cancel_work_sync(&cxl_local_work) on registration to pair
> with the cancel_work_sync(&cxl_cb_work) on unregistration.
Ok yea I could use cancel_work_sync().
>
> ...but I am not sure 2 work items are needed unless some default
> processing is going to happen in the "local" case.
I thought about merging patch 3 in this one and that would make the use of
this work function more clear.
>
> > + cper_callback(wd.event_type, &wd.rec);
> > + }
> > +}
> > +static DECLARE_WORK(cxl_cb_work, cxl_cper_cb_fn);
> > +
> > +static void cxl_cper_local_fn(struct work_struct *work)
> > +{
> > + struct cxl_cper_work_data wd;
> > +
> > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1,
> > + &cxl_cper_read_lock)) {
>
> This just looks like open coded / less efficient kfifo_reset_out().
>
> > + /* drop msg */
>
> If the proposal is to do nothing when no callback is registered then no
> need to have have cxl_local_work at all.
Patch 4 makes use of this.
>
> > + }
> > +}
> > +static DECLARE_WORK(cxl_local_work, cxl_cper_local_fn);
> > +
> > +/* Pointer for atomic switch of record processing */
> > +struct work_struct *cxl_cper_work = &cxl_local_work;
> > +
> > +static void cxl_cper_post_event(enum cxl_event_type event_type,
> > + struct cxl_cper_event_rec *rec)
> > +{
> > + struct cxl_cper_work_data wd;
> > +
> > + if (rec->hdr.length <= sizeof(rec->hdr) ||
> > + rec->hdr.length > sizeof(*rec)) {
> > + pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> > + rec->hdr.length);
> > + return;
> > + }
> > +
> > + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> > + pr_err(FW_WARN "CXL CPER invalid event\n");
> > + return;
> > + }
> > +
> > + wd.event_type = event_type;
> > + memcpy(&wd.rec, rec, sizeof(wd.rec));
>
> Unfortunate to have a double copy of the record into the stack variable
> and then again into the kfifo, but I can not immediately think of a way
> around that.
Yep. I could not either.
>
> > +
> > + kfifo_in_spinlocked(&cxl_cper_fifo, &wd, 1, &cxl_cper_write_lock);
> > + schedule_work(cxl_cper_work);
>
> I think you don't need 2 work items if you write it this way:
>
> spin_lock_irqsave(&cxl_cper_write_lock, flags);
> if (cxl_cper_work) {
> if (kfifo_put(&cxl_cper_fifo, &wd))
> schedule_work(cxl_cper_work);
> else
> pr_err_ratelimited(
> "buffer overflow when queuing CXL CPER record\n");
> }
> spin_lock_irqrestore(&cxl_cper_write_lock, flags);
>
> ...and then take the write lock when modifying that cxl_cper_work
> pointer between NULL and non-NULL.
I'll merge patch 4 here. Then this will be used.
Ira
next prev parent reply other threads:[~2024-03-01 22:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-29 7:13 [PATCH 0/4] efi/cxl-cper: Report CXL CPER events through tracing Ira Weiny
2024-02-29 7:13 ` [PATCH 1/4] cxl/event: Add missing include files Ira Weiny
2024-03-01 20:19 ` Dan Williams
2024-03-01 21:53 ` Ira Weiny
2024-02-29 7:13 ` [PATCH 2/4] acpi/ghes: Process CXL Component Events Ira Weiny
2024-03-01 20:51 ` Dan Williams
2024-03-01 22:05 ` Ira Weiny [this message]
2024-02-29 7:13 ` [PATCH 3/4] cxl/pci: Register for and process CPER events Ira Weiny
2024-03-01 21:49 ` Dan Williams
2024-02-29 7:13 ` [PATCH 4/4] ras/events: Trace CXL CPER events even without the CXL stack loaded Ira Weiny
2024-03-01 21:59 ` Dan Williams
2024-03-01 22:19 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=65e2511735a41_314b3329450@iweiny-mobl.notmuch \
--to=ira.weiny@intel.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=dan.carpenter@linaro.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=shiju.jose@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox