PATCH: platforms: avoid queuing work if possible

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* PATCH: platforms: avoid queuing work if possible
@ 2014-01-31  1:24 Peter Chang
  2014-01-31 11:53 ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Chang @ 2014-01-31  1:24 UTC (permalink / raw)
  To: linux-pci@vger.kernel.org, tj

[-- Attachment #1: Type: text/plain, Size: 267 bytes --]

we noticed a change in module loading behavior and tracked it
backwards to a small change in behavior due to
774a1221e862b343388347bac9b318767336b20b. this tries to avoid queuing
if possible, but if we do queue then copy the relevant task flag on
the target cpu.

\p

[-- Attachment #2: 0001-platforms-avoid-queuing-work-if-possible.patch --]
[-- Type: text/x-patch, Size: 2686 bytes --]

From 9e77188d8a8e74b7b4e3f3e7c6bb0deb8fdf7eb4 Mon Sep 17 00:00:00 2001
From: peter chang <dpf@google.com>
Date: Mon, 27 Jan 2014 16:24:53 -0800
Subject: [PATCH] platforms: avoid queuing work if possible

774a1221e862b343388347bac9b318767336b20b tries to avoid an expensive sync
by marking the current task iff it uses the async_schedule() machinery.
however, if the part of the driver's probe is called through a worker
thread the bookeeping fails to mark the correct task as having used the
async machinery.

this is a workaround for the common case that we're already running
on the correct cpu. however, it seems sort of reasonable to not use
the worker thread if we don't have to.

Tested:
- checked on vebmy2 (ibis + satasquatch) which is where the original issue
  was discovered.
- checked on fdha347 (ikaria + loggerhead + satasquatch) to show that
  nothing changed.

Change-Id: Ibe8996cf652735c74a9e06ba9fd03118078395bd
Google-Bug-Id: 12632867
Signed-off-by: peter chang <dpf@google.com>
---
 drivers/pci/pci-driver.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e6515e2..9f7e1e9 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -243,6 +243,7 @@ struct drv_dev_and_id {
 	struct pci_driver *drv;
 	struct pci_dev *dev;
 	const struct pci_device_id *id;
+	unsigned int task_flags;
 };
 
 static long local_pci_probe(void *_ddi)
@@ -267,7 +268,8 @@ static long local_pci_probe(void *_ddi)
 	if (rc) {
 		pci_dev->driver = NULL;
 		pm_runtime_put_sync(dev);
-	}
+	} else if (current)
+		ddi->task_flags = current->flags;
 	return rc;
 }
 
@@ -275,7 +277,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 			  const struct pci_device_id *id)
 {
 	int error, node;
-	struct drv_dev_and_id ddi = { drv, dev, id };
+	struct drv_dev_and_id ddi = { drv, dev, id, 0 };
 
 	/* Execute driver initialization on node where the device's
 	   bus is attached to.  This way the driver likely allocates
@@ -287,9 +289,16 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 
 		get_online_cpus();
 		cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
-		if (cpu < nr_cpu_ids)
+		/* try not to queue work if we're already on the target cpu */
+		if (cpu < nr_cpu_ids && cpu != get_cpu()) {
 			error = work_on_cpu(cpu, local_pci_probe, &ddi);
-		else
+			/* copy out some task flags for the module loading
+			 * case.
+			 */
+			if (!error && current)
+				current->flags |= (ddi.task_flags &
+						   PF_USED_ASYNC);
+		} else
 			error = local_pci_probe(&ddi);
 		put_online_cpus();
 	} else
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: PATCH: platforms: avoid queuing work if possible
  2014-01-31  1:24 PATCH: platforms: avoid queuing work if possible Peter Chang
@ 2014-01-31 11:53 ` Tejun Heo
  2014-01-31 12:00   ` Tejun Heo
  2014-02-05  4:13   ` Peter Chang
  0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2014-01-31 11:53 UTC (permalink / raw)
  To: Peter Chang; +Cc: linux-pci@vger.kernel.org

Hello,

On Thu, Jan 30, 2014 at 05:24:16PM -0800, Peter Chang wrote:
> 774a1221e862b343388347bac9b318767336b20b tries to avoid an expensive sync
> by marking the current task iff it uses the async_schedule() machinery.

That wasn't an optimization.  It was a workaround for deadlock.

> however, if the part of the driver's probe is called through a worker
> thread the bookeeping fails to mark the correct task as having used the
> async machinery.
> 
> this is a workaround for the common case that we're already running
> on the correct cpu. however, it seems sort of reasonable to not use
> the worker thread if we don't have to.
> 
> Tested:
> - checked on vebmy2 (ibis + satasquatch) which is where the original issue
>   was discovered.
> - checked on fdha347 (ikaria + loggerhead + satasquatch) to show that
>   nothing changed.

The patch description doesn't describe the reasons for the change at
all.  What issues have you encountered?  What are you trying to
achieve?

> Change-Id: Ibe8996cf652735c74a9e06ba9fd03118078395bd
> Google-Bug-Id: 12632867

The above tags don't mean anything upstream.  Please drop them when
posting patches.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: platforms: avoid queuing work if possible
  2014-01-31 11:53 ` Tejun Heo
@ 2014-01-31 12:00   ` Tejun Heo
  2014-02-05  4:13   ` Peter Chang
  1 sibling, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2014-01-31 12:00 UTC (permalink / raw)
  To: Peter Chang; +Cc: linux-pci@vger.kernel.org

On Fri, Jan 31, 2014 at 06:53:40AM -0500, Tejun Heo wrote:
> The patch description doesn't describe the reasons for the change at
> all.  What issues have you encountered?  What are you trying to
> achieve?

Okay, is it that on NUMA configurations the probing ends up being
bounced through work_on_cpu(), so tracking whether "async" was used
doesn't work making module probing finish before async probing is
complete?  If that's the case, wouldn't a more generic workaround be
handling that in work_on_cpu()?  You can change work_for_cpu_fn() to
clear ASYNC_USED before invoking the function and transfer the bit to
struct work_for_cpu so that work_on_cpu() can again transfer it to the
calling task.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: platforms: avoid queuing work if possible
  2014-01-31 11:53 ` Tejun Heo
  2014-01-31 12:00   ` Tejun Heo
@ 2014-02-05  4:13   ` Peter Chang
  2014-02-10 22:41     ` Tejun Heo
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Chang @ 2014-02-05  4:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-pci@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

sorry for the delay, this got filtered into the
check-every-once-in-while bucket.

2014-01-31 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Thu, Jan 30, 2014 at 05:24:16PM -0800, Peter Chang wrote:
>> 774a1221e862b343388347bac9b318767336b20b tries to avoid an expensive sync
>> by marking the current task iff it uses the async_schedule() machinery.
>
> That wasn't an optimization.  It was a workaround for deadlock.

sorry for the mis-characterization.

> The patch description doesn't describe the reasons for the change at
> all.  What issues have you encountered?  What are you trying to
> achieve?

the problem is that our current user-space environment expects block
devices to be in /sys/block after the module load completes. if the
cross cpu worker task is marked w/ ASYNC_USED (rather than the module
loading task), module loading completes and the async tasks are
'racing' w/ user space.

i fixed up the patch to just do the ASYNC_USED copying in the worker
function (since there was some bogus-ness in the original w/
get_cpu()). does this patch make more sense?

\p

[-- Attachment #2: 0001-pci-copy-PF_USED_ASYNC-in-the-cross-cpu-probe-case.patch --]
[-- Type: application/octet-stream, Size: 2675 bytes --]

From c1783408b64795040403345cdbd852d9cfd0f552 Mon Sep 17 00:00:00 2001
From: peter chang <dpf@google.com>
Date: Tue, 4 Feb 2014 19:51:03 -0800
Subject: [PATCH] pci: copy PF_USED_ASYNC in the cross cpu probe case

774a1221e862b343388347bac9b318767336b20b avoids a deadlock condition
by marking the current task iff it uses the async_schedule() machinery.
however, if the part of the driver's probe is called through a worker
thread the bookeeping fails to mark the correct task as having used the
async machinery. since the module loading task isn't marked it no longer
does an async_synchronize_full() which means that there's race w/
user-space which was expecting block devices in /sys/block and the
async tasks.

Tested:
- checked on vebmy2 (intel sandybridge + disktray) which is where the
  original issue was discovered.
- checked on fdha347 (intel westmere + multiple disktrays) to show that
  nothing changed when not making a cross cpu call.

Signed-off-by: peter chang <dpf@google.com>
---
 drivers/pci/pci-driver.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 25f0bc6..9d7f294 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -247,6 +247,7 @@ struct drv_dev_and_id {
 	struct pci_driver *drv;
 	struct pci_dev *dev;
 	const struct pci_device_id *id;
+	unsigned int task_flags;
 };
 
 static long local_pci_probe(void *_ddi)
@@ -268,8 +269,11 @@ static long local_pci_probe(void *_ddi)
 	pm_runtime_get_sync(dev);
 	pci_dev->driver = pci_drv;
 	rc = pci_drv->probe(pci_dev, ddi->id);
-	if (!rc)
+	if (!rc) {
+		if (current)
+			ddi->task_flags = current->flags;
 		return rc;
+	}
 	if (rc < 0) {
 		pci_dev->driver = NULL;
 		pm_runtime_put_sync(dev);
@@ -287,7 +291,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 			  const struct pci_device_id *id)
 {
 	int error, node;
-	struct drv_dev_and_id ddi = { drv, dev, id };
+	struct drv_dev_and_id ddi = { drv, dev, id, 0 };
 
 	/*
 	 * Execute driver initialization on node where the device is
@@ -314,9 +318,15 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 
 		get_online_cpus();
 		cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
-		if (cpu < nr_cpu_ids)
+		if (cpu < nr_cpu_ids) {
 			error = work_on_cpu(cpu, local_pci_probe, &ddi);
-		else
+			/* copy out some task flags for the module loading
+			 * case.
+			 */
+			if (!error && current)
+				current->flags |= (ddi.task_flags &
+						   PF_USED_ASYNC);
+		} else
 			error = local_pci_probe(&ddi);
 		put_online_cpus();
 	} else
-- 
1.9.0.rc1.175.g0b1dcb5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: PATCH: platforms: avoid queuing work if possible
  2014-02-05  4:13   ` Peter Chang
@ 2014-02-10 22:41     ` Tejun Heo
  0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2014-02-10 22:41 UTC (permalink / raw)
  To: Peter Chang; +Cc: linux-pci@vger.kernel.org

Hello, Peter.

On Tue, Feb 04, 2014 at 08:13:58PM -0800, Peter Chang wrote:
> the problem is that our current user-space environment expects block
> devices to be in /sys/block after the module load completes. if the
> cross cpu worker task is marked w/ ASYNC_USED (rather than the module
> loading task), module loading completes and the async tasks are
> 'racing' w/ user space.
>
> i fixed up the patch to just do the ASYNC_USED copying in the worker
> function (since there was some bogus-ness in the original w/
> get_cpu()). does this patch make more sense?

Heh, this is almost inherently nasty.  The whole thing is a nasty hack
after all. :( Would it be possible to make work_on_cpu() do the
copying instead?  That might be a bit cleaner and I think it could be
better to keep the work around in async / workqueue side rather than
specific drivers.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-10 22:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-31  1:24 PATCH: platforms: avoid queuing work if possible Peter Chang
2014-01-31 11:53 ` Tejun Heo
2014-01-31 12:00   ` Tejun Heo
2014-02-05  4:13   ` Peter Chang
2014-02-10 22:41     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).