From: Dan Williams <dan.j.williams@intel.com>
To: Davidlohr Bueso <dave@stgolabs.net>,
Dan Williams <dan.j.williams@intel.com>
Cc: <dave.jiang@intel.com>, <alison.schofield@intel.com>,
<vishal.l.verma@intel.com>, <Jonathan.Cameron@huawei.com>,
<fan.ni@samsung.com>, <a.manzanares@samsung.com>,
<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH v2] cxl/mbox: Add background cmd handling machinery
Date: Mon, 22 May 2023 11:19:13 -0700 [thread overview]
Message-ID: <646bb221db164_33fb3294b3@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <64jusopypczmfmkeg5ulzw6gwbfv4mdnk2jilmja7iyr5dsukc@vkpwdjsjsbk5>
Davidlohr Bueso wrote:
> On Fri, 19 May 2023, Dan Williams wrote:
>
> >> +static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
> >> +{
> >> + struct cxl_dev_state *cxlds = id;
> >> +
> >> + /* spurious or raced with hw? */
> >> + if (unlikely(!cxl_mbox_background_complete(cxlds))) {
> >
> >I expect this will fire all the time. While the specification allows for
> >a vector per-reason there's nothing stopping an implementation from
> >using one vector for all reasons. There's little incentive for spending
> >silicon gates on exclusive vectors since none of the CXL interrupts
> >service software fast paths.
>
> fyi I have removed the warn below entirely. Handler is now simply:
>
> if (cxl_mbox_background_complete())
> wake_up();
> return IRQ_HANDLED;
>
> >
> >> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> >> +
> >> + dev_warn(&pdev->dev,
> >> + "Mailbox background operation IRQ but incomplete\n");
> >> + goto done;
> >> + }
> >> +
> >> + /* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> >> + rcuwait_wake_up(&cxlds->mbox_wait);
> >> +done:
> >> + return IRQ_HANDLED;
> >> +}
> >> +
> >> /**
> >> * __cxl_pci_mbox_send_cmd() - Execute a mailbox command
> >> * @cxlds: The device state to communicate with.
> >> @@ -177,6 +205,57 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds,
> >> mbox_cmd->return_code =
> >> FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg);
> >>
> >> + /*
> >> + * Handle the background command in a synchronous manner.
> >> + *
> >> + * All other mailbox commands will serialize/queue on the mbox_mutex,
> >> + * which we currently hold. Furthermore this also guarantees that
> >> + * cxl_mbox_background_complete() checks are safe amongst each other,
> >> + * in that no new bg operation can occur in between.
> >> + *
> >> + * Background operations are timesliced in accordance with the nature
> >> + * of the command. In the event of timeout, the mailbox state is
> >> + * indeterminate until the next successful command submission and the
> >> + * driver can get back in sync with the hardware state.
> >> + */
> >> + if (mbox_cmd->return_code == CXL_MBOX_CMD_RC_BACKGROUND) {
> >> + long ret;
> >> + u64 bg_status_reg;
> >> + int i, timeout = mbox_cmd->poll_interval;
> >> +
> >> + dev_dbg(dev, "Mailbox background operation (0x%04x) started\n",
> >> + mbox_cmd->opcode);
> >> +
> >> + for (i = 0; i < mbox_cmd->poll_count; i++) {
> >> + ret = rcuwait_wait_event_timeout(&cxlds->mbox_wait,
> >> + cxl_mbox_background_complete(cxlds),
> >> + TASK_INTERRUPTIBLE,
> >> + msecs_to_jiffies(timeout));
> >> + if (ret > 0)
> >> + break;
> >> + if (ret < 0) /* interrupted by a signal */
> >
> >Have you tested this path? I am curious what happens on the next command
> >submission? As far as I can see foreground commands would not be
> >impacted but the next background submission would fail instead of wait,
> >right?
>
> No, I haven't tested this path. My expectation here is that this would be
> similar to the timeout case in that the driver returns but the hw keeps
> processing the command. The next command, fg or bg, would be at the mercy
> of the hw doing the right thing until it's done with the canceled one.
Right, it's that "mercy" part where there needs to be documentation of
what to expect.
If someone sends SIGTERM after triggering a scan media and then
immediately resubmits it maybe the right answer is to go into another
interruptible sleep awaiting the last background state to clear out. It
just seems like returning an error leaves userspace to fend for itself
with little visibility of what it needs to do next. SIGTERM during a
background command means, "I do not care about the previous result".
Given the driver is mitigating long running background cycles that
implies that waiting before submitting the next colliding command would
not be onerous. They can always SIGTERM again if the wait gets too long.
> >Perhaps a dev_err_ratelimited() message and an EBUSY status for when
> >that happens to at least indicate, "hardware is still chewing on a
> >cancelled request". Later we can think about whether that's recoverable
> >via a reset.
>
> Well but reaching this path already assumes hw didn't error out the current
> command because the canceled one was still being run.
>
> Alternatively we could also just make the sleep uninterruptible, but because
> this comes from the user triggering the bg op, I thought against it.
I don't think that would solve the problem of how to detect when the
driver is out of sync with the device and what to do to resolve that
situation. An inopportune SIGTERM is indistinguishable from getting the
polling interval wrong.
>
> >> + return ret;
>
> rcuwait, unlike regular wait, returns -EINTR instead of -ERESTARTSYS. I've
> updated the return value here to the latter.
>
> >> + }
> >> +
> >> + if (!cxl_mbox_background_complete(cxlds)) {
> >> + u64 md_status =
> >> + readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> >> +
> >> + cxl_cmd_err(cxlds->dev, mbox_cmd, md_status,
> >> + "background timeout");
> >> + return -ETIMEDOUT;
> >> + }
> >> +
> >> + bg_status_reg = readq(cxlds->regs.mbox +
> >> + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
> >> + mbox_cmd->return_code =
> >> + FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK,
> >> + bg_status_reg);
> >> + dev_dbg(dev,
> >> + "Mailbox background operation (0x%04x) completed\n",
> >> + mbox_cmd->opcode);
> >> + }
> >> +
> >> if (mbox_cmd->return_code != CXL_MBOX_CMD_RC_SUCCESS) {
> >> dev_dbg(dev, "Mailbox operation had an error: %s\n",
> >> cxl_mbox_cmd_rc2str(mbox_cmd));
> >> @@ -271,6 +350,29 @@ static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> >> dev_dbg(cxlds->dev, "Mailbox payload sized %zu",
> >> cxlds->payload_size);
> >>
> >> + rcuwait_init(&cxlds->mbox_wait);
> >> + if (cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) {
> >> + int irq, msgnum;
> >> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> >> +
> >> + msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
> >> + irq = pci_irq_vector(pdev, msgnum);
> >> + if (irq < 0)
> >> + goto mbox_poll;
> >> +
> >> + if (devm_request_irq(cxlds->dev, irq, cxl_pci_mbox_irq,
> >> + IRQF_SHARED, NULL, cxlds))
> >
> >This needs IRQF_ONESHOT and that @dev_id needs to be globally unique in
> >case @irq is shared. So I think you want to factor out a common helper
> >from cxl_event_req_irq() that this can reuse. I.e. cxl_event_req_irq()
> >already gets the flags and @dev_id right, so just create a cxl_req_irq()
> >that takes the irq as an argument and share it.
>
> For the oneshot this was a hardirq, so I don't think this is needed, but
> regardless, I've added the following helper in a new patch which both
> events and mbox can call:
Ah, right, this was hard-irq only, I missed that.
I wouldn't be opposed to cxl_request_irq() following pci_request_irq()'s
lead and taking multiple handler arguments to skip the thread invocation
when it is not needed.
>
> static int cxl_request_irq(struct cxl_dev_state *cxlds,
> int irq, irq_handler_t handler)
> {
> struct device *dev = cxlds->dev;
> struct cxl_dev_id *dev_id;
>
> /* dev_id must be globally unique and must contain the cxlds */
> dev_id = devm_kzalloc(dev, sizeof(*dev_id), GFP_KERNEL);
> if (!dev_id)
> return -ENOMEM;
> dev_id->cxlds = cxlds;
>
> return devm_request_threaded_irq(dev, irq, NULL, handler,
> IRQF_SHARED | IRQF_ONESHOT, NULL,
> dev_id);
> }
>
> >
> >> + goto mbox_poll;
> >> +
> >> + /* only enable background cmd mbox irq support */
> >> + writel(CXLDEV_MBOX_CTRL_BG_CMD_IRQ,
> >> + cxlds->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET);
> >
> >Just in case some other code ever enables doorbell interrupts it would
> >be nice if this performed a read-modify-write rather than assuming it
> >can clobber that setting.
>
> Hmm do you mean enabling/disabling dynamically?
No, grep for:
"write.*CTRL"
...and observe that all writes to control registers are doing a
read-modify-write sequence.
> Otherwise I would have
> expected this to be a one time thing which we could just OR both bits.
> Maybe this could be revisited when needed? Jonathan suggested enabling
> doorbell irqs in the future, which I plan on looking into at some point.
Sure, that code might be in another function also performing its own
read-modify-write, so my comment just asking for this to be consistent
with other control register updates in the driver.
next prev parent reply other threads:[~2023-05-22 18:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-02 17:18 [PATCH 0/3] cxl: Handle background commands Davidlohr Bueso
2023-05-02 17:18 ` [PATCH 1/3] rcuwait: Support timeouts Davidlohr Bueso
2023-05-19 21:38 ` Dan Williams
2023-05-19 22:55 ` Davidlohr Bueso
2023-05-20 11:03 ` Peter Zijlstra
2023-05-02 17:18 ` [PATCH 2/3] cxl/pci: Allocate irq vectors earlier in pci probe Davidlohr Bueso
2023-05-15 10:08 ` Jonathan Cameron
2023-05-02 17:18 ` [PATCH 3/3] cxl/mbox: Add background cmd handling machinery Davidlohr Bueso
2023-05-02 17:55 ` Davidlohr Bueso
2023-05-03 14:57 ` [PATCH v2] " Davidlohr Bueso
2023-05-15 10:30 ` Jonathan Cameron
2023-05-15 15:40 ` Davidlohr Bueso
2023-05-15 16:19 ` Jonathan Cameron
2023-05-16 7:58 ` Li, Ming
2023-05-16 17:02 ` Davidlohr Bueso
2023-05-19 23:13 ` Dan Williams
2023-05-22 16:58 ` Davidlohr Bueso
2023-05-22 18:19 ` Dan Williams [this message]
2023-05-22 18:57 ` Davidlohr Bueso
2023-05-22 20:16 ` Dan Williams
2023-05-22 20:28 ` Davidlohr Bueso
2023-05-22 21:21 ` Dan Williams
2023-05-22 21:26 ` Davidlohr Bueso
2023-05-22 22:48 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=646bb221db164_33fb3294b3@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=a.manzanares@samsung.com \
--cc=alison.schofield@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=fan.ni@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox