All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Gregory Price <gregory.price@memverge.com>
Cc: <qemu-devel@nongnu.org>, Michael Tsirkin <mst@redhat.com>,
	Ben Widawsky <bwidawsk@kernel.org>, <linux-cxl@vger.kernel.org>,
	Huai-Cheng Kuo <hchkuo@avery-design.com.tw>,
	Chris Browy <cbrowy@avery-design.com>, <linuxarm@huawei.com>
Subject: Re: [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
Date: Mon, 31 Oct 2022 12:51:04 -0700	[thread overview]
Message-ID: <Y2AnKB88ALYm9c5L@iweiny-desk3> (raw)
In-Reply-To: <20221025115617.000035b6@huawei.com>

On Tue, Oct 25, 2022 at 11:56:17AM +0100, Jonathan Cameron wrote:
> On Mon, 24 Oct 2022 13:42:54 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Fri, Oct 14, 2022 at 04:10:44PM +0100, Jonathan Cameron wrote:
> > > From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > 
> > > The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
> > > in "-device cxl-type3"'s command option. The file is required to provide
> > > the whole CDAT table in binary mode. The other is to use the default
> > > that provides some 'reasonable' numbers based on type of memory and
> > > size.
> > > 
> > > The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
> > > capability offset 0x190. The config read/write to this capability range
> > > can be generated in the OS to request the CDAT data.
> > > 
> > > Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > Signed-off-by: Chris Browy <cbrowy@avery-design.com>
> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >   
> > 
> > In reviewing this patch under a debug kernel, I'm discovering some
> > warnings / tracebacks that I want to get a sanity check on.  They appear
> > to be warnings, but i'm not 100% sure what to make of them.
> > 
> > I get the following stack traces during boot (on the x86 machine).
> > 
> > Removing the type-3 device from the configuration prevents the traceback
> > from occurring, suggesting it's something to do with that code in
> > particular (which tracks with the patch here)
> 
> Thanks Gregory,
> 
> We have an INIT_WORK() in pci_doe_submit_task()
> that in the pci_doe_discovery() call is based a work structure that is
> on the stack.  As such should be INIT_WORK_ONSTACK()
> 
> This is a little tricky becaues there is no particular reason to assume
> all struct pci_doe_task instances will be on the stack though they are
> today.  We don't really want to break the layering as would be necessary
> to have this init at the locations where we otherwise initialize the
> containing structure.
> 
> My first thought is add an 'onstack' bool to either the pci_doe_submit_task()
> or perhaps better would be to add it to the pci_doe_task() and have
> macros like DECLARE_CDAT_DOE_TASK_ONSTACK() set it appropriately so we
> can call the right INIT_WORK_*() variant later.
> 
> Ira / others, which way to go to fix this?

Yes Jonathan is on the right track here.  Though I was confused why no one had
ever seen this till now.  I see it was because I never ran with the
CONFIG_DEBUG_OBJECTS_WORK option.

While we could have a declaration macro.  I think the best solution is to
separate the task from the internal implementation; see patch below.  I was
never fully happy with the idea of having the work struct in the user visible
task object anyway.

> 
> I'll also reply to the last version of that series to make sure people see
> this discussion.

Thanks I've been looking for time to look at this series and have missed it.
So the ping over there helped!  :-D

Ira

From 9e16d5e399723412acbaec1bb9be807d5e5bf7fc Mon Sep 17 00:00:00 2001
From: Ira Weiny <ira.weiny@intel.com>
Date: Mon, 31 Oct 2022 11:31:30 -0700
Subject: [RFC PATCH] PCI/doe: Fix work struct declaration

The callers of pci_doe_submit_task() allocate the pci_doe_task on the
stack.  This causes the work structure to be allocated on the stack
without pci_doe_submit_task() knowing.  Work item initialization needs
to be done with either INIT_WORK_ONSTACK() or INIT_WORK() depending on
how the work item is allocated.

Jonathan suggested creating doe task allocation macros such as
DECLARE_CDAT_DOE_TASK_ONSTACK().  This would work, however, having the
work structure be part of pci_doe_task seems like a layering violation.

Create an internal struct pci_doe_work which represents the work and use
this for the work queue.

RFC because I'm wondering if a gfp_t flags parameter should be added to
pci_doe_submit_task() or not.

[1] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m88a7f50dcce52f30c8bf5c3dcc06fa9843b54a2d

Reported-by: Gregory Price <gregory.price@memverge.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/pci/doe.c       | 56 +++++++++++++++++++++++++++--------------
 include/linux/pci-doe.h |  4 ---
 2 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index e402f05068a5..6a9fa57e2cac 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -29,6 +29,12 @@
 #define PCI_DOE_FLAG_CANCEL	0
 #define PCI_DOE_FLAG_DEAD	1
 
+struct pci_doe_work {
+	struct pci_doe_task *task;
+	struct work_struct work;
+	struct pci_doe_mb *doe_mb;
+};
+
 /**
  * struct pci_doe_mb - State for a single DOE mailbox
  *
@@ -214,9 +220,9 @@ static void signal_task_complete(struct pci_doe_task *task, int rv)
 	task->complete(task);
 }
 
-static void signal_task_abort(struct pci_doe_task *task, int rv)
+static void signal_task_abort(struct pci_doe_mb *doe_mb, struct pci_doe_task *task,
+			      int rv)
 {
-	struct pci_doe_mb *doe_mb = task->doe_mb;
 	struct pci_dev *pdev = doe_mb->pdev;
 
 	if (pci_doe_abort(doe_mb)) {
@@ -233,9 +239,10 @@ static void signal_task_abort(struct pci_doe_task *task, int rv)
 
 static void doe_statemachine_work(struct work_struct *work)
 {
-	struct pci_doe_task *task = container_of(work, struct pci_doe_task,
-						 work);
-	struct pci_doe_mb *doe_mb = task->doe_mb;
+	struct pci_doe_work *doe_work = container_of(work, struct pci_doe_work,
+						     work);
+	struct pci_doe_task *task = doe_work->task;
+	struct pci_doe_mb *doe_mb = doe_work->doe_mb;
 	struct pci_dev *pdev = doe_mb->pdev;
 	int offset = doe_mb->cap_offset;
 	unsigned long timeout_jiffies;
@@ -244,7 +251,7 @@ static void doe_statemachine_work(struct work_struct *work)
 
 	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags)) {
 		signal_task_complete(task, -EIO);
-		return;
+		goto free_work;
 	}
 
 	/* Send request */
@@ -260,8 +267,8 @@ static void doe_statemachine_work(struct work_struct *work)
 		if (rc == -EBUSY)
 			dev_err_ratelimited(&pdev->dev, "[%x] busy detected; another entity is sending conflicting requests\n",
 					    offset);
-		signal_task_abort(task, rc);
-		return;
+		signal_task_abort(doe_mb, task, rc);
+		goto free_work;
 	}
 
 	timeout_jiffies = jiffies + PCI_DOE_TIMEOUT;
@@ -269,30 +276,32 @@ static void doe_statemachine_work(struct work_struct *work)
 retry_resp:
 	pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
 	if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
-		signal_task_abort(task, -EIO);
-		return;
+		signal_task_abort(doe_mb, task, -EIO);
+		goto free_work;
 	}
 
 	if (!FIELD_GET(PCI_DOE_STATUS_DATA_OBJECT_READY, val)) {
 		if (time_after(jiffies, timeout_jiffies)) {
-			signal_task_abort(task, -EIO);
-			return;
+			signal_task_abort(doe_mb, task, -EIO);
+			goto free_work;
 		}
 		rc = pci_doe_wait(doe_mb, PCI_DOE_POLL_INTERVAL);
 		if (rc) {
-			signal_task_abort(task, rc);
-			return;
+			signal_task_abort(doe_mb, task, rc);
+			goto free_work;
 		}
 		goto retry_resp;
 	}
 
 	rc  = pci_doe_recv_resp(doe_mb, task);
 	if (rc < 0) {
-		signal_task_abort(task, rc);
-		return;
+		signal_task_abort(doe_mb, task, rc);
+		goto free_work;
 	}
 
 	signal_task_complete(task, rc);
+free_work:
+	kfree(doe_work);
 }
 
 static void pci_doe_task_complete(struct pci_doe_task *task)
@@ -510,10 +519,14 @@ EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
  *
  * Excess data will be discarded.
  *
+ * Context: non-atomic; allocates with GFP_KERNEL
+ *
  * RETURNS: 0 when task has been successfully queued, -ERRNO on error
  */
 int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 {
+	struct pci_doe_work *work;
+
 	if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type))
 		return -EINVAL;
 
@@ -528,9 +541,14 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
 		return -EIO;
 
-	task->doe_mb = doe_mb;
-	INIT_WORK(&task->work, doe_statemachine_work);
-	queue_work(doe_mb->work_queue, &task->work);
+	work = kzalloc(sizeof(*work), GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	work->task = task;
+	work->doe_mb = doe_mb;
+	INIT_WORK(&work->work, doe_statemachine_work);
+	queue_work(doe_mb->work_queue, &work->work);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(pci_doe_submit_task);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index ed9b4df792b8..7c573cdbf17b 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -52,10 +52,6 @@ struct pci_doe_task {
 	int rv;
 	void (*complete)(struct pci_doe_task *task);
 	void *private;
-
-	/* No need for the user to initialize these fields */
-	struct work_struct work;
-	struct pci_doe_mb *doe_mb;
 };
 
 /**
-- 
2.37.2

  reply	other threads:[~2022-10-31 19:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
2022-10-14 15:10 ` Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 1/5] hw/pci: PCIe Data Object Exchange emulation Jonathan Cameron
2022-10-14 15:10   ` Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 2/5] hw/mem/cxl-type3: Add MSIX support Jonathan Cameron
2022-10-14 15:10   ` Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation Jonathan Cameron
2022-10-14 15:10   ` Jonathan Cameron via
2022-10-14 15:10 ` [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron
2022-10-14 15:10   ` Jonathan Cameron via
2022-10-24 17:42   ` Gregory Price
2022-10-25 10:56     ` Jonathan Cameron
2022-10-25 10:56       ` Jonathan Cameron via
2022-10-31 19:51       ` Ira Weiny [this message]
2022-10-14 15:10 ` [PATCH v9 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE Jonathan Cameron
2022-10-14 15:10   ` Jonathan Cameron via
2022-10-25 22:06 ` [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
2022-10-25 22:06   ` Jonathan Cameron via
2022-10-25 22:30   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2AnKB88ALYm9c5L@iweiny-desk3 \
    --to=ira.weiny@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=bwidawsk@kernel.org \
    --cc=cbrowy@avery-design.com \
    --cc=gregory.price@memverge.com \
    --cc=hchkuo@avery-design.com.tw \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.