qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Gregory Price <gregory.price@memverge.com>
Cc: <qemu-devel@nongnu.org>, Michael Tsirkin <mst@redhat.com>,
	Ben Widawsky <bwidawsk@kernel.org>, <linux-cxl@vger.kernel.org>,
	Huai-Cheng Kuo <hchkuo@avery-design.com.tw>,
	Chris Browy <cbrowy@avery-design.com>, <linuxarm@huawei.com>
Subject: Re: [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
Date: Mon, 31 Oct 2022 12:51:04 -0700	[thread overview]
Message-ID: <Y2AnKB88ALYm9c5L@iweiny-desk3> (raw)
In-Reply-To: <20221025115617.000035b6@huawei.com>

On Tue, Oct 25, 2022 at 11:56:17AM +0100, Jonathan Cameron wrote:
> On Mon, 24 Oct 2022 13:42:54 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Fri, Oct 14, 2022 at 04:10:44PM +0100, Jonathan Cameron wrote:
> > > From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > 
> > > The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
> > > in "-device cxl-type3"'s command option. The file is required to provide
> > > the whole CDAT table in binary mode. The other is to use the default
> > > that provides some 'reasonable' numbers based on type of memory and
> > > size.
> > > 
> > > The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
> > > capability offset 0x190. The config read/write to this capability range
> > > can be generated in the OS to request the CDAT data.
> > > 
> > > Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > Signed-off-by: Chris Browy <cbrowy@avery-design.com>
> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >   
> > 
> > In reviewing this patch under a debug kernel, I'm discovering some
> > warnings / tracebacks that I want to get a sanity check on.  They appear
> > to be warnings, but i'm not 100% sure what to make of them.
> > 
> > I get the following stack traces during boot (on the x86 machine).
> > 
> > Removing the type-3 device from the configuration prevents the traceback
> > from occurring, suggesting it's something to do with that code in
> > particular (which tracks with the patch here)
> 
> Thanks Gregory,
> 
> We have an INIT_WORK() in pci_doe_submit_task()
> that in the pci_doe_discovery() call is based a work structure that is
> on the stack.  As such should be INIT_WORK_ONSTACK()
> 
> This is a little tricky becaues there is no particular reason to assume
> all struct pci_doe_task instances will be on the stack though they are
> today.  We don't really want to break the layering as would be necessary
> to have this init at the locations where we otherwise initialize the
> containing structure.
> 
> My first thought is add an 'onstack' bool to either the pci_doe_submit_task()
> or perhaps better would be to add it to the pci_doe_task() and have
> macros like DECLARE_CDAT_DOE_TASK_ONSTACK() set it appropriately so we
> can call the right INIT_WORK_*() variant later.
> 
> Ira / others, which way to go to fix this?

Yes Jonathan is on the right track here.  Though I was confused why no one had
ever seen this till now.  I see it was because I never ran with the
CONFIG_DEBUG_OBJECTS_WORK option.

While we could have a declaration macro.  I think the best solution is to
separate the task from the internal implementation; see patch below.  I was
never fully happy with the idea of having the work struct in the user visible
task object anyway.

> 
> I'll also reply to the last version of that series to make sure people see
> this discussion.

Thanks I've been looking for time to look at this series and have missed it.
So the ping over there helped!  :-D

Ira

From 9e16d5e399723412acbaec1bb9be807d5e5bf7fc Mon Sep 17 00:00:00 2001
From: Ira Weiny <ira.weiny@intel.com>
Date: Mon, 31 Oct 2022 11:31:30 -0700
Subject: [RFC PATCH] PCI/doe: Fix work struct declaration

The callers of pci_doe_submit_task() allocate the pci_doe_task on the
stack.  This causes the work structure to be allocated on the stack
without pci_doe_submit_task() knowing.  Work item initialization needs
to be done with either INIT_WORK_ONSTACK() or INIT_WORK() depending on
how the work item is allocated.

Jonathan suggested creating doe task allocation macros such as
DECLARE_CDAT_DOE_TASK_ONSTACK().  This would work, however, having the
work structure be part of pci_doe_task seems like a layering violation.

Create an internal struct pci_doe_work which represents the work and use
this for the work queue.

RFC because I'm wondering if a gfp_t flags parameter should be added to
pci_doe_submit_task() or not.

[1] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m88a7f50dcce52f30c8bf5c3dcc06fa9843b54a2d

Reported-by: Gregory Price <gregory.price@memverge.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/pci/doe.c       | 56 +++++++++++++++++++++++++++--------------
 include/linux/pci-doe.h |  4 ---
 2 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index e402f05068a5..6a9fa57e2cac 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -29,6 +29,12 @@
 #define PCI_DOE_FLAG_CANCEL	0
 #define PCI_DOE_FLAG_DEAD	1
 
+struct pci_doe_work {
+	struct pci_doe_task *task;
+	struct work_struct work;
+	struct pci_doe_mb *doe_mb;
+};
+
 /**
  * struct pci_doe_mb - State for a single DOE mailbox
  *
@@ -214,9 +220,9 @@ static void signal_task_complete(struct pci_doe_task *task, int rv)
 	task->complete(task);
 }
 
-static void signal_task_abort(struct pci_doe_task *task, int rv)
+static void signal_task_abort(struct pci_doe_mb *doe_mb, struct pci_doe_task *task,
+			      int rv)
 {
-	struct pci_doe_mb *doe_mb = task->doe_mb;
 	struct pci_dev *pdev = doe_mb->pdev;
 
 	if (pci_doe_abort(doe_mb)) {
@@ -233,9 +239,10 @@ static void signal_task_abort(struct pci_doe_task *task, int rv)
 
 static void doe_statemachine_work(struct work_struct *work)
 {
-	struct pci_doe_task *task = container_of(work, struct pci_doe_task,
-						 work);
-	struct pci_doe_mb *doe_mb = task->doe_mb;
+	struct pci_doe_work *doe_work = container_of(work, struct pci_doe_work,
+						     work);
+	struct pci_doe_task *task = doe_work->task;
+	struct pci_doe_mb *doe_mb = doe_work->doe_mb;
 	struct pci_dev *pdev = doe_mb->pdev;
 	int offset = doe_mb->cap_offset;
 	unsigned long timeout_jiffies;
@@ -244,7 +251,7 @@ static void doe_statemachine_work(struct work_struct *work)
 
 	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags)) {
 		signal_task_complete(task, -EIO);
-		return;
+		goto free_work;
 	}
 
 	/* Send request */
@@ -260,8 +267,8 @@ static void doe_statemachine_work(struct work_struct *work)
 		if (rc == -EBUSY)
 			dev_err_ratelimited(&pdev->dev, "[%x] busy detected; another entity is sending conflicting requests\n",
 					    offset);
-		signal_task_abort(task, rc);
-		return;
+		signal_task_abort(doe_mb, task, rc);
+		goto free_work;
 	}
 
 	timeout_jiffies = jiffies + PCI_DOE_TIMEOUT;
@@ -269,30 +276,32 @@ static void doe_statemachine_work(struct work_struct *work)
 retry_resp:
 	pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
 	if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
-		signal_task_abort(task, -EIO);
-		return;
+		signal_task_abort(doe_mb, task, -EIO);
+		goto free_work;
 	}
 
 	if (!FIELD_GET(PCI_DOE_STATUS_DATA_OBJECT_READY, val)) {
 		if (time_after(jiffies, timeout_jiffies)) {
-			signal_task_abort(task, -EIO);
-			return;
+			signal_task_abort(doe_mb, task, -EIO);
+			goto free_work;
 		}
 		rc = pci_doe_wait(doe_mb, PCI_DOE_POLL_INTERVAL);
 		if (rc) {
-			signal_task_abort(task, rc);
-			return;
+			signal_task_abort(doe_mb, task, rc);
+			goto free_work;
 		}
 		goto retry_resp;
 	}
 
 	rc  = pci_doe_recv_resp(doe_mb, task);
 	if (rc < 0) {
-		signal_task_abort(task, rc);
-		return;
+		signal_task_abort(doe_mb, task, rc);
+		goto free_work;
 	}
 
 	signal_task_complete(task, rc);
+free_work:
+	kfree(doe_work);
 }
 
 static void pci_doe_task_complete(struct pci_doe_task *task)
@@ -510,10 +519,14 @@ EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
  *
  * Excess data will be discarded.
  *
+ * Context: non-atomic; allocates with GFP_KERNEL
+ *
  * RETURNS: 0 when task has been successfully queued, -ERRNO on error
  */
 int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 {
+	struct pci_doe_work *work;
+
 	if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type))
 		return -EINVAL;
 
@@ -528,9 +541,14 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
 		return -EIO;
 
-	task->doe_mb = doe_mb;
-	INIT_WORK(&task->work, doe_statemachine_work);
-	queue_work(doe_mb->work_queue, &task->work);
+	work = kzalloc(sizeof(*work), GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	work->task = task;
+	work->doe_mb = doe_mb;
+	INIT_WORK(&work->work, doe_statemachine_work);
+	queue_work(doe_mb->work_queue, &work->work);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(pci_doe_submit_task);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index ed9b4df792b8..7c573cdbf17b 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -52,10 +52,6 @@ struct pci_doe_task {
 	int rv;
 	void (*complete)(struct pci_doe_task *task);
 	void *private;
-
-	/* No need for the user to initialize these fields */
-	struct work_struct work;
-	struct pci_doe_mb *doe_mb;
 };
 
 /**
-- 
2.37.2


  reply	other threads:[~2022-10-31 19:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 1/5] hw/pci: PCIe Data Object Exchange emulation Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 2/5] hw/mem/cxl-type3: Add MSIX support Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron via
2022-10-24 17:42   ` Gregory Price
2022-10-25 10:56     ` Jonathan Cameron via
2022-10-31 19:51       ` Ira Weiny [this message]
2022-10-14 15:10 ` [PATCH v9 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE Jonathan Cameron via
2022-10-25 22:06 ` [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron via
2022-10-25 22:30   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2AnKB88ALYm9c5L@iweiny-desk3 \
    --to=ira.weiny@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=bwidawsk@kernel.org \
    --cc=cbrowy@avery-design.com \
    --cc=gregory.price@memverge.com \
    --cc=hchkuo@avery-design.com.tw \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).