From: Alison Schofield <alison.schofield@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
linux-cxl@vger.kernel.org
Subject: Re: [PATCH] cxl/core: Hold the region rwsem during poison ops
Date: Sun, 26 Nov 2023 16:03:50 -0800 [thread overview]
Message-ID: <ZWPc5gganv6rwvj+@aschofie-mobl2> (raw)
In-Reply-To: <655eaf7d73e_b2e82943a@dwillia2-xfh.jf.intel.com.notmuch>
On Wed, Nov 22, 2023 at 05:48:45PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
snip
> >
> >
> > Fixes: 7ff6ad107588 ("cxl/memdev: Add trigger_poison_list sysfs attribute")
> > Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
> > Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command")
> > Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command")
>
> I think this is an indication that the fixes would benefit from being
> broken up into at least 2 commits so that the specific side effect of
> each can be commented upon.
>
Agree. I've split into 2 patches and expanded the impact in each commit
message.
> For example:
>
> - Fix walking committed decoders in cxl_trigger_poison_list()
That's not where I found the lock issue. When walking committed
decoders, the region_rwsem was held. It is when we think there
are no regions mapped, and collect by memdev only, that I found
a gap.
>
> - Fix walking dpa to region lookups in cxl_{inject,clear}_poison()
We're probably in sync here, albeit different words.
>
> ...look like 2 separate topics in this combined patch.
>
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> > drivers/cxl/core/memdev.c | 18 +++++++++---------
> > drivers/cxl/core/region.c | 5 -----
> > 2 files changed, 9 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index fc5c2b414793..961da365b097 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -227,9 +227,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd)
> > if (!port || !is_cxl_endpoint(port))
> > return -EINVAL;
> >
> > - rc = down_read_interruptible(&cxl_dpa_rwsem);
> > - if (rc)
> > - return rc;
>
> What the rationale for dropping interruptible, it seems appropriate here
> since this function is directly servicing a debugfs trigger and maybe
> someone gets tired of waiting.
>
This one is the sysfs, not debugfs, trigger. They shouldn't get tired of
waiting as it is only reading static info - the existing poison list of
the device.
However, I did put back the _interruptible...for now.
I am not clear on how it all trickles down if this gets interrupted and
need to understand that further. We are trying to maintain a poison
state, and prevent partial reads of poison lists. That is not part of
this patch, so I will study that some more and come back at it if I
still think it's an issue.
> > + down_read(&cxl_region_rwsem);
> > + down_read(&cxl_dpa_rwsem);
> >
snip
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2467,10 +2467,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> > struct cxl_poison_context ctx;
> > int rc = 0;
> >
> > - rc = down_read_interruptible(&cxl_region_rwsem);
> > - if (rc)
> > - return rc;
> > -
> > ctx = (struct cxl_poison_context) {
> > .port = port
> > };
> > @@ -2480,7 +2476,6 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
> > rc = cxl_get_poison_unmapped(to_cxl_memdev(port->uport_dev),
> > &ctx);
> >
> > - up_read(&cxl_region_rwsem);
> > return rc;
>
> This hunk deserves to be called out that region locking is being
> upleveled as part of topic 1, and this reinforces splitting the 2 topics
> into 2 patches.
done.
>
> Keep the _interruptible versions throughout, if you want to drop
> interruptible that should be a separate follow-on behavior change patch.
> The need to keep _interruptible also obviates conversion to cleanup.h
> helpers for now.
done.
prev parent reply other threads:[~2023-11-27 0:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-14 2:53 [PATCH] cxl/core: Hold the region rwsem during poison ops alison.schofield
2023-11-14 18:20 ` Dave Jiang
2023-11-14 18:25 ` [PATCH] cxl: Convert pioson ops rwsem usages to scope based resource management Dave Jiang
2023-11-15 23:32 ` Alison Schofield
2023-11-15 23:55 ` Dave Jiang
2023-11-16 2:17 ` Alison Schofield
2023-11-16 16:27 ` Dave Jiang
2023-11-23 1:12 ` Dan Williams
2023-11-23 1:33 ` Dan Williams
2023-11-16 22:14 ` [PATCH] cxl/core: Hold the region rwsem during poison ops Davidlohr Bueso
2023-11-20 19:07 ` Alison Schofield
2023-11-23 1:48 ` Dan Williams
2023-11-27 0:03 ` Alison Schofield [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWPc5gganv6rwvj+@aschofie-mobl2 \
--to=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox