[RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3
@ 2014-05-16 16:16 Frederic Weisbecker
  2014-05-16 16:16 ` [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues Frederic Weisbecker
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-16 16:16 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Lai Jiangshan, Mike Galbraith, Paul E. McKenney, Tejun Heo,
	Viresh Kumar

So in this version I actually save the cpumask belonging to wq (before
it's intersected against the low level cpumask) in its unbounds attrs.

But the attrs passed to pwq and worker pools have the low level cpumask
computed against the wq cpumask.

It makes it easier that way as the wq cpumask itself remains untouched.
It's a user setting so it must stay intact. OTOH the cpumask of pwqs and
worker pools only belong to the implementation so it's the right
place to store the effective cpumask.

Also wq_update_unbound_numa() is fixed, and cpu_possible_mask() 
is restored as a base. Thanks to Lai!

Thanks,
	Frederic

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
        core/workqueue-v5
---

Frederic Weisbecker (4):
      workqueue: Reorder sysfs code
      workqueue: Create low-level unbound workqueues cpumask
      workqueue: Split apply attrs code from its locking
      workqueue: Allow modifying low level unbound workqueue cpumask

Lai Jiangshan (1):
      workqueue: Allow changing attributions of ordered workqueues

 kernel/workqueue.c | 1674 ++++++++++++++++++++++++++++------------------------
 1 file changed, 900 insertions(+), 774 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-16 16:16 [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Frederic Weisbecker
@ 2014-05-16 16:16 ` Frederic Weisbecker
  2014-05-16 20:12   ` Tejun Heo
  2014-05-21  7:29   ` Lai Jiangshan
  2014-05-16 16:16 ` [PATCH 2/5] workqueue: Reorder sysfs code Frederic Weisbecker
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-16 16:16 UTC (permalink / raw)
  To: LKML
  Cc: Lai Jiangshan, Christoph Lameter, Kevin Hilman, Mike Galbraith,
	Paul E. McKenney, Tejun Heo, Viresh Kumar, Frederic Weisbecker

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Changing the attributions of a workqueue imply the addition of new pwqs
to replace the old ones. But the current implementation doesn't handle
ordered workqueues because they can't carry multi-pwqs without breaking
ordering. Hence ordered workqueues currently aren't allowed to call
apply_workqueue_attrs().

This result in several special cases in the workqueue code to handle
ordered workqueues. And with the addition of global workqueue cpumask,
these special cases are going to spread out even further as the number
of callers of apply_workqueue_attrs() will be increasing.

So we want apply_workqueue_attrs() to be smarter and to be able to
handle all sort of unbound workqueues.

This solution propose to create new pwqs on ordered workqueues with
max_active initialized as zero. Then when the older pwq is finally
released, the new one becomes active and its max_active value is set to 1.
This way we make sure that a only a single pwq ever run at a given time
on an ordered workqueue.

This enforces ordering and non-reentrancy on higher level.
Note that ordered works then become exceptions and aren't subject to
previous pool requeue that usually guarantees reentrancy while works
requeue themselves back-to-back. Otherwise it could prevent a pool switch
from ever happening by delaying the release of the old pool forever and
never letting the new one in.

Now that we can change ordered workqueue attributes, lets allow them
to be registered as WQ_SYSFS and allow to change their nice and cpumask
values. Note that in order to preserve ordering guarantee, we still
disallow changing their max_active and no_numa values.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 69 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c3f076f..c68e84f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1355,8 +1355,16 @@ retry:
 	 * If @work was previously on a different pool, it might still be
 	 * running there, in which case the work needs to be queued on that
 	 * pool to guarantee non-reentrancy.
+	 *
+	 * We guarantee that only one pwq is active on an ordered workqueue.
+	 * That alone enforces non-reentrancy for works. So ordered works don't
+	 * need to be requeued to their previous pool. Not to mention that
+	 * an ordered work requeing itself over and over on the same pool may
+	 * prevent a pwq from being released in case of a pool switch. The
+	 * newest pool in that case couldn't switch in and its pending works
+	 * would starve.
 	 */
-	last_pool = get_work_pool(work);
+	last_pool = wq->flags & __WQ_ORDERED ? NULL : get_work_pool(work);
 	if (last_pool && last_pool != pwq->pool) {
 		struct worker *worker;
 
@@ -3319,6 +3327,10 @@ static ssize_t wq_numa_store(struct device *dev, struct device_attribute *attr,
 	struct workqueue_attrs *attrs;
 	int v, ret;
 
+	/* Creating per-node pwqs breaks ordering guarantee. Keep no_numa = 1 */
+	if (WARN_ON(wq->flags & __WQ_ORDERED))
+		return -EINVAL;
+
 	attrs = wq_sysfs_prep_attrs(wq);
 	if (!attrs)
 		return -ENOMEM;
@@ -3379,14 +3391,6 @@ int workqueue_sysfs_register(struct workqueue_struct *wq)
 	struct wq_device *wq_dev;
 	int ret;
 
-	/*
-	 * Adjusting max_active or creating new pwqs by applyting
-	 * attributes breaks ordering guarantee.  Disallow exposing ordered
-	 * workqueues.
-	 */
-	if (WARN_ON(wq->flags & __WQ_ORDERED))
-		return -EINVAL;
-
 	wq->wq_dev = wq_dev = kzalloc(sizeof(*wq_dev), GFP_KERNEL);
 	if (!wq_dev)
 		return -ENOMEM;
@@ -3708,6 +3712,13 @@ static void rcu_free_pwq(struct rcu_head *rcu)
 			container_of(rcu, struct pool_workqueue, rcu));
 }
 
+static struct pool_workqueue *oldest_pwq(struct workqueue_struct *wq)
+{
+	return list_last_entry(&wq->pwqs, struct pool_workqueue, pwqs_node);
+}
+
+static void pwq_adjust_max_active(struct pool_workqueue *pwq);
+
 /*
  * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt
  * and needs to be destroyed.
@@ -3723,14 +3734,12 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
 	if (WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)))
 		return;
 
-	/*
-	 * Unlink @pwq.  Synchronization against wq->mutex isn't strictly
-	 * necessary on release but do it anyway.  It's easier to verify
-	 * and consistent with the linking path.
-	 */
 	mutex_lock(&wq->mutex);
 	list_del_rcu(&pwq->pwqs_node);
 	is_last = list_empty(&wq->pwqs);
+	/* try to activate the oldest pwq when needed */
+	if (!is_last && (wq->flags & __WQ_ORDERED))
+		pwq_adjust_max_active(oldest_pwq(wq));
 	mutex_unlock(&wq->mutex);
 
 	mutex_lock(&wq_pool_mutex);
@@ -3749,6 +3758,16 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
 	}
 }
 
+static bool pwq_active(struct pool_workqueue *pwq)
+{
+	/* Only the oldest pwq is active in the ordered wq */
+	if (pwq->wq->flags & __WQ_ORDERED)
+		return pwq == oldest_pwq(pwq->wq);
+
+	/* All pwqs in the non-ordered wq are active */
+	return true;
+}
+
 /**
  * pwq_adjust_max_active - update a pwq's max_active to the current setting
  * @pwq: target pool_workqueue
@@ -3771,7 +3790,8 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
 
 	spin_lock_irq(&pwq->pool->lock);
 
-	if (!freezable || !(pwq->pool->flags & POOL_FREEZING)) {
+	if ((!freezable || !(pwq->pool->flags & POOL_FREEZING)) &&
+	     pwq_active(pwq)) {
 		pwq->max_active = wq->saved_max_active;
 
 		while (!list_empty(&pwq->delayed_works) &&
@@ -3825,11 +3845,11 @@ static void link_pwq(struct pool_workqueue *pwq)
 	 */
 	pwq->work_color = wq->work_color;
 
+	/* link in @pwq on the head of &wq->pwqs */
+	list_add_rcu(&pwq->pwqs_node, &wq->pwqs);
+
 	/* sync max_active to the current setting */
 	pwq_adjust_max_active(pwq);
-
-	/* link in @pwq */
-	list_add_rcu(&pwq->pwqs_node, &wq->pwqs);
 }
 
 /* obtain a pool matching @attr and create a pwq associating the pool and @wq */
@@ -3955,8 +3975,8 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 	if (WARN_ON(!(wq->flags & WQ_UNBOUND)))
 		return -EINVAL;
 
-	/* creating multiple pwqs breaks ordering guarantee */
-	if (WARN_ON((wq->flags & __WQ_ORDERED) && !list_empty(&wq->pwqs)))
+	/* creating multiple per-node pwqs breaks ordering guarantee */
+	if (WARN_ON((wq->flags & __WQ_ORDERED) && !attrs->no_numa))
 		return -EINVAL;
 
 	pwq_tbl = kzalloc(wq_numa_tbl_len * sizeof(pwq_tbl[0]), GFP_KERNEL);
@@ -4146,7 +4166,7 @@ out_unlock:
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
 	bool highpri = wq->flags & WQ_HIGHPRI;
-	int cpu, ret;
+	int cpu;
 
 	if (!(wq->flags & WQ_UNBOUND)) {
 		wq->cpu_pwqs = alloc_percpu(struct pool_workqueue);
@@ -4167,12 +4187,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 		}
 		return 0;
 	} else if (wq->flags & __WQ_ORDERED) {
-		ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
-		/* there should only be single pwq for ordering guarantee */
-		WARN(!ret && (wq->pwqs.next != &wq->dfl_pwq->pwqs_node ||
-			      wq->pwqs.prev != &wq->dfl_pwq->pwqs_node),
-		     "ordering guarantee broken for workqueue %s\n", wq->name);
-		return ret;
+		return apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
 	} else {
 		return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/5] workqueue: Reorder sysfs code
  2014-05-16 16:16 [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Frederic Weisbecker
  2014-05-16 16:16 ` [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues Frederic Weisbecker
@ 2014-05-16 16:16 ` Frederic Weisbecker
  2014-05-16 16:16 ` [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-16 16:16 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Lai Jiangshan, Mike Galbraith, Paul E. McKenney, Tejun Heo,
	Viresh Kumar

The sysfs code usually belongs to the botom of the file since it deals
with high level objects. In the workqueue code it's misplaced and such
that we'll need to work around functions references to allow the sysfs
code to call APIs like apply_workqueue_attrs().

Lets move that block further in the file, right above alloc_workqueue_key()
which reference it.

Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 626 ++++++++++++++++++++++++++---------------------------
 1 file changed, 313 insertions(+), 313 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c68e84f..e5d7719 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3137,319 +3137,6 @@ int execute_in_process_context(work_func_t fn, struct execute_work *ew)
 }
 EXPORT_SYMBOL_GPL(execute_in_process_context);
 
-#ifdef CONFIG_SYSFS
-/*
- * Workqueues with WQ_SYSFS flag set is visible to userland via
- * /sys/bus/workqueue/devices/WQ_NAME.  All visible workqueues have the
- * following attributes.
- *
- *  per_cpu	RO bool	: whether the workqueue is per-cpu or unbound
- *  max_active	RW int	: maximum number of in-flight work items
- *
- * Unbound workqueues have the following extra attributes.
- *
- *  id		RO int	: the associated pool ID
- *  nice	RW int	: nice value of the workers
- *  cpumask	RW mask	: bitmask of allowed CPUs for the workers
- */
-struct wq_device {
-	struct workqueue_struct		*wq;
-	struct device			dev;
-};
-
-static struct workqueue_struct *dev_to_wq(struct device *dev)
-{
-	struct wq_device *wq_dev = container_of(dev, struct wq_device, dev);
-
-	return wq_dev->wq;
-}
-
-static ssize_t per_cpu_show(struct device *dev, struct device_attribute *attr,
-			    char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-
-	return scnprintf(buf, PAGE_SIZE, "%d\n", (bool)!(wq->flags & WQ_UNBOUND));
-}
-static DEVICE_ATTR_RO(per_cpu);
-
-static ssize_t max_active_show(struct device *dev,
-			       struct device_attribute *attr, char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-
-	return scnprintf(buf, PAGE_SIZE, "%d\n", wq->saved_max_active);
-}
-
-static ssize_t max_active_store(struct device *dev,
-				struct device_attribute *attr, const char *buf,
-				size_t count)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	int val;
-
-	if (sscanf(buf, "%d", &val) != 1 || val <= 0)
-		return -EINVAL;
-
-	workqueue_set_max_active(wq, val);
-	return count;
-}
-static DEVICE_ATTR_RW(max_active);
-
-static struct attribute *wq_sysfs_attrs[] = {
-	&dev_attr_per_cpu.attr,
-	&dev_attr_max_active.attr,
-	NULL,
-};
-ATTRIBUTE_GROUPS(wq_sysfs);
-
-static ssize_t wq_pool_ids_show(struct device *dev,
-				struct device_attribute *attr, char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	const char *delim = "";
-	int node, written = 0;
-
-	rcu_read_lock_sched();
-	for_each_node(node) {
-		written += scnprintf(buf + written, PAGE_SIZE - written,
-				     "%s%d:%d", delim, node,
-				     unbound_pwq_by_node(wq, node)->pool->id);
-		delim = " ";
-	}
-	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
-	rcu_read_unlock_sched();
-
-	return written;
-}
-
-static ssize_t wq_nice_show(struct device *dev, struct device_attribute *attr,
-			    char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	int written;
-
-	mutex_lock(&wq->mutex);
-	written = scnprintf(buf, PAGE_SIZE, "%d\n", wq->unbound_attrs->nice);
-	mutex_unlock(&wq->mutex);
-
-	return written;
-}
-
-/* prepare workqueue_attrs for sysfs store operations */
-static struct workqueue_attrs *wq_sysfs_prep_attrs(struct workqueue_struct *wq)
-{
-	struct workqueue_attrs *attrs;
-
-	attrs = alloc_workqueue_attrs(GFP_KERNEL);
-	if (!attrs)
-		return NULL;
-
-	mutex_lock(&wq->mutex);
-	copy_workqueue_attrs(attrs, wq->unbound_attrs);
-	mutex_unlock(&wq->mutex);
-	return attrs;
-}
-
-static ssize_t wq_nice_store(struct device *dev, struct device_attribute *attr,
-			     const char *buf, size_t count)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	struct workqueue_attrs *attrs;
-	int ret;
-
-	attrs = wq_sysfs_prep_attrs(wq);
-	if (!attrs)
-		return -ENOMEM;
-
-	if (sscanf(buf, "%d", &attrs->nice) == 1 &&
-	    attrs->nice >= MIN_NICE && attrs->nice <= MAX_NICE)
-		ret = apply_workqueue_attrs(wq, attrs);
-	else
-		ret = -EINVAL;
-
-	free_workqueue_attrs(attrs);
-	return ret ?: count;
-}
-
-static ssize_t wq_cpumask_show(struct device *dev,
-			       struct device_attribute *attr, char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	int written;
-
-	mutex_lock(&wq->mutex);
-	written = cpumask_scnprintf(buf, PAGE_SIZE, wq->unbound_attrs->cpumask);
-	mutex_unlock(&wq->mutex);
-
-	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
-	return written;
-}
-
-static ssize_t wq_cpumask_store(struct device *dev,
-				struct device_attribute *attr,
-				const char *buf, size_t count)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	struct workqueue_attrs *attrs;
-	int ret;
-
-	attrs = wq_sysfs_prep_attrs(wq);
-	if (!attrs)
-		return -ENOMEM;
-
-	ret = cpumask_parse(buf, attrs->cpumask);
-	if (!ret)
-		ret = apply_workqueue_attrs(wq, attrs);
-
-	free_workqueue_attrs(attrs);
-	return ret ?: count;
-}
-
-static ssize_t wq_numa_show(struct device *dev, struct device_attribute *attr,
-			    char *buf)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	int written;
-
-	mutex_lock(&wq->mutex);
-	written = scnprintf(buf, PAGE_SIZE, "%d\n",
-			    !wq->unbound_attrs->no_numa);
-	mutex_unlock(&wq->mutex);
-
-	return written;
-}
-
-static ssize_t wq_numa_store(struct device *dev, struct device_attribute *attr,
-			     const char *buf, size_t count)
-{
-	struct workqueue_struct *wq = dev_to_wq(dev);
-	struct workqueue_attrs *attrs;
-	int v, ret;
-
-	/* Creating per-node pwqs breaks ordering guarantee. Keep no_numa = 1 */
-	if (WARN_ON(wq->flags & __WQ_ORDERED))
-		return -EINVAL;
-
-	attrs = wq_sysfs_prep_attrs(wq);
-	if (!attrs)
-		return -ENOMEM;
-
-	ret = -EINVAL;
-	if (sscanf(buf, "%d", &v) == 1) {
-		attrs->no_numa = !v;
-		ret = apply_workqueue_attrs(wq, attrs);
-	}
-
-	free_workqueue_attrs(attrs);
-	return ret ?: count;
-}
-
-static struct device_attribute wq_sysfs_unbound_attrs[] = {
-	__ATTR(pool_ids, 0444, wq_pool_ids_show, NULL),
-	__ATTR(nice, 0644, wq_nice_show, wq_nice_store),
-	__ATTR(cpumask, 0644, wq_cpumask_show, wq_cpumask_store),
-	__ATTR(numa, 0644, wq_numa_show, wq_numa_store),
-	__ATTR_NULL,
-};
-
-static struct bus_type wq_subsys = {
-	.name				= "workqueue",
-	.dev_groups			= wq_sysfs_groups,
-};
-
-static int __init wq_sysfs_init(void)
-{
-	return subsys_virtual_register(&wq_subsys, NULL);
-}
-core_initcall(wq_sysfs_init);
-
-static void wq_device_release(struct device *dev)
-{
-	struct wq_device *wq_dev = container_of(dev, struct wq_device, dev);
-
-	kfree(wq_dev);
-}
-
-/**
- * workqueue_sysfs_register - make a workqueue visible in sysfs
- * @wq: the workqueue to register
- *
- * Expose @wq in sysfs under /sys/bus/workqueue/devices.
- * alloc_workqueue*() automatically calls this function if WQ_SYSFS is set
- * which is the preferred method.
- *
- * Workqueue user should use this function directly iff it wants to apply
- * workqueue_attrs before making the workqueue visible in sysfs; otherwise,
- * apply_workqueue_attrs() may race against userland updating the
- * attributes.
- *
- * Return: 0 on success, -errno on failure.
- */
-int workqueue_sysfs_register(struct workqueue_struct *wq)
-{
-	struct wq_device *wq_dev;
-	int ret;
-
-	wq->wq_dev = wq_dev = kzalloc(sizeof(*wq_dev), GFP_KERNEL);
-	if (!wq_dev)
-		return -ENOMEM;
-
-	wq_dev->wq = wq;
-	wq_dev->dev.bus = &wq_subsys;
-	wq_dev->dev.init_name = wq->name;
-	wq_dev->dev.release = wq_device_release;
-
-	/*
-	 * unbound_attrs are created separately.  Suppress uevent until
-	 * everything is ready.
-	 */
-	dev_set_uevent_suppress(&wq_dev->dev, true);
-
-	ret = device_register(&wq_dev->dev);
-	if (ret) {
-		kfree(wq_dev);
-		wq->wq_dev = NULL;
-		return ret;
-	}
-
-	if (wq->flags & WQ_UNBOUND) {
-		struct device_attribute *attr;
-
-		for (attr = wq_sysfs_unbound_attrs; attr->attr.name; attr++) {
-			ret = device_create_file(&wq_dev->dev, attr);
-			if (ret) {
-				device_unregister(&wq_dev->dev);
-				wq->wq_dev = NULL;
-				return ret;
-			}
-		}
-	}
-
-	kobject_uevent(&wq_dev->dev.kobj, KOBJ_ADD);
-	return 0;
-}
-
-/**
- * workqueue_sysfs_unregister - undo workqueue_sysfs_register()
- * @wq: the workqueue to unregister
- *
- * If @wq is registered to sysfs by workqueue_sysfs_register(), unregister.
- */
-static void workqueue_sysfs_unregister(struct workqueue_struct *wq)
-{
-	struct wq_device *wq_dev = wq->wq_dev;
-
-	if (!wq->wq_dev)
-		return;
-
-	wq->wq_dev = NULL;
-	device_unregister(&wq_dev->dev);
-}
-#else	/* CONFIG_SYSFS */
-static void workqueue_sysfs_unregister(struct workqueue_struct *wq)	{ }
-#endif	/* CONFIG_SYSFS */
-
 /**
  * free_workqueue_attrs - free a workqueue_attrs
  * @attrs: workqueue_attrs to free
@@ -4163,6 +3850,319 @@ out_unlock:
 	put_pwq_unlocked(old_pwq);
 }
 
+#ifdef CONFIG_SYSFS
+/*
+ * Workqueues with WQ_SYSFS flag set is visible to userland via
+ * /sys/bus/workqueue/devices/WQ_NAME.  All visible workqueues have the
+ * following attributes.
+ *
+ *  per_cpu	RO bool	: whether the workqueue is per-cpu or unbound
+ *  max_active	RW int	: maximum number of in-flight work items
+ *
+ * Unbound workqueues have the following extra attributes.
+ *
+ *  id		RO int	: the associated pool ID
+ *  nice	RW int	: nice value of the workers
+ *  cpumask	RW mask	: bitmask of allowed CPUs for the workers
+ */
+struct wq_device {
+	struct workqueue_struct		*wq;
+	struct device			dev;
+};
+
+static struct workqueue_struct *dev_to_wq(struct device *dev)
+{
+	struct wq_device *wq_dev = container_of(dev, struct wq_device, dev);
+
+	return wq_dev->wq;
+}
+
+static ssize_t per_cpu_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n", (bool)!(wq->flags & WQ_UNBOUND));
+}
+static DEVICE_ATTR_RO(per_cpu);
+
+static ssize_t max_active_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n", wq->saved_max_active);
+}
+
+static ssize_t max_active_store(struct device *dev,
+				struct device_attribute *attr, const char *buf,
+				size_t count)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	int val;
+
+	if (sscanf(buf, "%d", &val) != 1 || val <= 0)
+		return -EINVAL;
+
+	workqueue_set_max_active(wq, val);
+	return count;
+}
+static DEVICE_ATTR_RW(max_active);
+
+static struct attribute *wq_sysfs_attrs[] = {
+	&dev_attr_per_cpu.attr,
+	&dev_attr_max_active.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(wq_sysfs);
+
+static ssize_t wq_pool_ids_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	const char *delim = "";
+	int node, written = 0;
+
+	rcu_read_lock_sched();
+	for_each_node(node) {
+		written += scnprintf(buf + written, PAGE_SIZE - written,
+				     "%s%d:%d", delim, node,
+				     unbound_pwq_by_node(wq, node)->pool->id);
+		delim = " ";
+	}
+	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
+	rcu_read_unlock_sched();
+
+	return written;
+}
+
+static ssize_t wq_nice_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	int written;
+
+	mutex_lock(&wq->mutex);
+	written = scnprintf(buf, PAGE_SIZE, "%d\n", wq->unbound_attrs->nice);
+	mutex_unlock(&wq->mutex);
+
+	return written;
+}
+
+/* prepare workqueue_attrs for sysfs store operations */
+static struct workqueue_attrs *wq_sysfs_prep_attrs(struct workqueue_struct *wq)
+{
+	struct workqueue_attrs *attrs;
+
+	attrs = alloc_workqueue_attrs(GFP_KERNEL);
+	if (!attrs)
+		return NULL;
+
+	mutex_lock(&wq->mutex);
+	copy_workqueue_attrs(attrs, wq->unbound_attrs);
+	mutex_unlock(&wq->mutex);
+	return attrs;
+}
+
+static ssize_t wq_nice_store(struct device *dev, struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	struct workqueue_attrs *attrs;
+	int ret;
+
+	attrs = wq_sysfs_prep_attrs(wq);
+	if (!attrs)
+		return -ENOMEM;
+
+	if (sscanf(buf, "%d", &attrs->nice) == 1 &&
+	    attrs->nice >= MIN_NICE && attrs->nice <= MAX_NICE)
+		ret = apply_workqueue_attrs(wq, attrs);
+	else
+		ret = -EINVAL;
+
+	free_workqueue_attrs(attrs);
+	return ret ?: count;
+}
+
+static ssize_t wq_cpumask_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	int written;
+
+	mutex_lock(&wq->mutex);
+	written = cpumask_scnprintf(buf, PAGE_SIZE, wq->unbound_attrs->cpumask);
+	mutex_unlock(&wq->mutex);
+
+	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
+	return written;
+}
+
+static ssize_t wq_cpumask_store(struct device *dev,
+				struct device_attribute *attr,
+				const char *buf, size_t count)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	struct workqueue_attrs *attrs;
+	int ret;
+
+	attrs = wq_sysfs_prep_attrs(wq);
+	if (!attrs)
+		return -ENOMEM;
+
+	ret = cpumask_parse(buf, attrs->cpumask);
+	if (!ret)
+		ret = apply_workqueue_attrs(wq, attrs);
+
+	free_workqueue_attrs(attrs);
+	return ret ?: count;
+}
+
+static ssize_t wq_numa_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	int written;
+
+	mutex_lock(&wq->mutex);
+	written = scnprintf(buf, PAGE_SIZE, "%d\n",
+			    !wq->unbound_attrs->no_numa);
+	mutex_unlock(&wq->mutex);
+
+	return written;
+}
+
+static ssize_t wq_numa_store(struct device *dev, struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	struct workqueue_struct *wq = dev_to_wq(dev);
+	struct workqueue_attrs *attrs;
+	int v, ret;
+
+	/* Creating per-node pwqs breaks ordering guarantee. Keep no_numa = 1 */
+	if (WARN_ON(wq->flags & __WQ_ORDERED))
+		return -EINVAL;
+
+	attrs = wq_sysfs_prep_attrs(wq);
+	if (!attrs)
+		return -ENOMEM;
+
+	ret = -EINVAL;
+	if (sscanf(buf, "%d", &v) == 1) {
+		attrs->no_numa = !v;
+		ret = apply_workqueue_attrs(wq, attrs);
+	}
+
+	free_workqueue_attrs(attrs);
+	return ret ?: count;
+}
+
+static struct device_attribute wq_sysfs_unbound_attrs[] = {
+	__ATTR(pool_ids, 0444, wq_pool_ids_show, NULL),
+	__ATTR(nice, 0644, wq_nice_show, wq_nice_store),
+	__ATTR(cpumask, 0644, wq_cpumask_show, wq_cpumask_store),
+	__ATTR(numa, 0644, wq_numa_show, wq_numa_store),
+	__ATTR_NULL,
+};
+
+static struct bus_type wq_subsys = {
+	.name				= "workqueue",
+	.dev_groups			= wq_sysfs_groups,
+};
+
+static int __init wq_sysfs_init(void)
+{
+	return subsys_virtual_register(&wq_subsys, NULL);
+}
+core_initcall(wq_sysfs_init);
+
+static void wq_device_release(struct device *dev)
+{
+	struct wq_device *wq_dev = container_of(dev, struct wq_device, dev);
+
+	kfree(wq_dev);
+}
+
+/**
+ * workqueue_sysfs_register - make a workqueue visible in sysfs
+ * @wq: the workqueue to register
+ *
+ * Expose @wq in sysfs under /sys/bus/workqueue/devices.
+ * alloc_workqueue*() automatically calls this function if WQ_SYSFS is set
+ * which is the preferred method.
+ *
+ * Workqueue user should use this function directly iff it wants to apply
+ * workqueue_attrs before making the workqueue visible in sysfs; otherwise,
+ * apply_workqueue_attrs() may race against userland updating the
+ * attributes.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int workqueue_sysfs_register(struct workqueue_struct *wq)
+{
+	struct wq_device *wq_dev;
+	int ret;
+
+	wq->wq_dev = wq_dev = kzalloc(sizeof(*wq_dev), GFP_KERNEL);
+	if (!wq_dev)
+		return -ENOMEM;
+
+	wq_dev->wq = wq;
+	wq_dev->dev.bus = &wq_subsys;
+	wq_dev->dev.init_name = wq->name;
+	wq_dev->dev.release = wq_device_release;
+
+	/*
+	 * unbound_attrs are created separately.  Suppress uevent until
+	 * everything is ready.
+	 */
+	dev_set_uevent_suppress(&wq_dev->dev, true);
+
+	ret = device_register(&wq_dev->dev);
+	if (ret) {
+		kfree(wq_dev);
+		wq->wq_dev = NULL;
+		return ret;
+	}
+
+	if (wq->flags & WQ_UNBOUND) {
+		struct device_attribute *attr;
+
+		for (attr = wq_sysfs_unbound_attrs; attr->attr.name; attr++) {
+			ret = device_create_file(&wq_dev->dev, attr);
+			if (ret) {
+				device_unregister(&wq_dev->dev);
+				wq->wq_dev = NULL;
+				return ret;
+			}
+		}
+	}
+
+	kobject_uevent(&wq_dev->dev.kobj, KOBJ_ADD);
+	return 0;
+}
+
+/**
+ * workqueue_sysfs_unregister - undo workqueue_sysfs_register()
+ * @wq: the workqueue to unregister
+ *
+ * If @wq is registered to sysfs by workqueue_sysfs_register(), unregister.
+ */
+static void workqueue_sysfs_unregister(struct workqueue_struct *wq)
+{
+	struct wq_device *wq_dev = wq->wq_dev;
+
+	if (!wq->wq_dev)
+		return;
+
+	wq->wq_dev = NULL;
+	device_unregister(&wq_dev->dev);
+}
+#else	/* CONFIG_SYSFS */
+static void workqueue_sysfs_unregister(struct workqueue_struct *wq)	{ }
+#endif	/* CONFIG_SYSFS */
+
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
 	bool highpri = wq->flags & WQ_HIGHPRI;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 16:16 [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Frederic Weisbecker
  2014-05-16 16:16 ` [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues Frederic Weisbecker
  2014-05-16 16:16 ` [PATCH 2/5] workqueue: Reorder sysfs code Frederic Weisbecker
@ 2014-05-16 16:16 ` Frederic Weisbecker
  2014-05-16 17:52   ` Christoph Lameter
  2014-05-16 16:16 ` [PATCH 4/5] workqueue: Split apply attrs code from its locking Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-16 16:16 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Lai Jiangshan, Mike Galbraith, Paul E. McKenney, Tejun Heo,
	Viresh Kumar

Create a cpumask that limit the affinity of all unbound workqueues.
This cpumask is controlled though a file at the root of the workqueue
sysfs directory.

It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
such that the effective cpumask applied for a given unbound workqueue is
the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
the new /sys/devices/virtual/workqueue/cpumask_unbounds file.

This patch implements the basic infrastructure and the read interface.
cpumask_unbounds is initially set to cpu_possible_mask.

Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e5d7719..1252a8c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -293,6 +293,8 @@ static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
 static LIST_HEAD(workqueues);		/* PL: list of all workqueues */
 static bool workqueue_freezing;		/* PL: have wqs started freezing? */
 
+static cpumask_var_t wq_unbound_cpumask;
+
 /* the per-cpu worker pools */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
 				     cpu_worker_pools);
@@ -3674,7 +3676,7 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 
 	/* make a copy of @attrs and sanitize it */
 	copy_workqueue_attrs(new_attrs, attrs);
-	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
+	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask);
 
 	/*
 	 * We may create multiple pwqs with differing cpumasks.  Make a
@@ -4071,9 +4073,29 @@ static struct bus_type wq_subsys = {
 	.dev_groups			= wq_sysfs_groups,
 };
 
+static ssize_t unbounds_cpumask_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	int written;
+
+	written = cpumask_scnprintf(buf, PAGE_SIZE, wq_unbound_cpumask);
+	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
+
+	return written;
+}
+
+static struct device_attribute wq_sysfs_cpumask_attr =
+	__ATTR(cpumask, 0444, unbounds_cpumask_show, NULL);
+
 static int __init wq_sysfs_init(void)
 {
-	return subsys_virtual_register(&wq_subsys, NULL);
+	int err;
+
+	err = subsys_virtual_register(&wq_subsys, NULL);
+	if (err)
+		return err;
+
+	return device_create_file(wq_subsys.dev_root, &wq_sysfs_cpumask_attr);
 }
 core_initcall(wq_sysfs_init);
 
@@ -5068,6 +5090,9 @@ static int __init init_workqueues(void)
 
 	WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
 
+	BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
+	cpumask_copy(wq_unbound_cpumask, cpu_possible_mask);
+
 	pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
 
 	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/5] workqueue: Split apply attrs code from its locking
  2014-05-16 16:16 [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-05-16 16:16 ` [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask Frederic Weisbecker
@ 2014-05-16 16:16 ` Frederic Weisbecker
  2014-05-16 16:16 ` [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask Frederic Weisbecker
  2014-07-11  8:41 ` [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Lai Jiangshan
  5 siblings, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-16 16:16 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Lai Jiangshan, Mike Galbraith, Paul E. McKenney, Tejun Heo,
	Viresh Kumar

In order to allow overriding the unbound wqs low-level cpumask, we
need to be able to call apply_workqueue_attr() on all workqueues in
the pool list.

Now since traversing the pool list require to lock it, we can't currently
call apply_workqueue_attr() under the pool traversal.

So lets provide a version of apply_workqueue_attrs() that can be
called when the pool is already locked.

Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 77 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 33 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1252a8c..2aa296d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3637,24 +3637,9 @@ static struct pool_workqueue *numa_pwq_tbl_install(struct workqueue_struct *wq,
 	return old_pwq;
 }
 
-/**
- * apply_workqueue_attrs - apply new workqueue_attrs to an unbound workqueue
- * @wq: the target workqueue
- * @attrs: the workqueue_attrs to apply, allocated with alloc_workqueue_attrs()
- *
- * Apply @attrs to an unbound workqueue @wq.  Unless disabled, on NUMA
- * machines, this function maps a separate pwq to each NUMA node with
- * possibles CPUs in @attrs->cpumask so that work items are affine to the
- * NUMA node it was issued on.  Older pwqs are released as in-flight work
- * items finish.  Note that a work item which repeatedly requeues itself
- * back-to-back will stay on its current pwq.
- *
- * Performs GFP_KERNEL allocations.
- *
- * Return: 0 on success and -errno on failure.
- */
-int apply_workqueue_attrs(struct workqueue_struct *wq,
-			  const struct workqueue_attrs *attrs)
+static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
+					const struct workqueue_attrs *attrs,
+					cpumask_var_t unbounds_cpumask)
 {
 	struct workqueue_attrs *new_attrs, *tmp_attrs;
 	struct pool_workqueue **pwq_tbl, *dfl_pwq;
@@ -3676,7 +3661,7 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 
 	/* make a copy of @attrs and sanitize it */
 	copy_workqueue_attrs(new_attrs, attrs);
-	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask);
+	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbounds_cpumask);
 
 	/*
 	 * We may create multiple pwqs with differing cpumasks.  Make a
@@ -3686,15 +3671,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 	copy_workqueue_attrs(tmp_attrs, new_attrs);
 
 	/*
-	 * CPUs should stay stable across pwq creations and installations.
-	 * Pin CPUs, determine the target cpumask for each node and create
-	 * pwqs accordingly.
-	 */
-	get_online_cpus();
-
-	mutex_lock(&wq_pool_mutex);
-
-	/*
 	 * If something goes wrong during CPU up/down, we'll fall back to
 	 * the default pwq covering whole @attrs->cpumask.  Always create
 	 * it even if we don't use it immediately.
@@ -3714,8 +3690,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 		}
 	}
 
-	mutex_unlock(&wq_pool_mutex);
-
 	/* all pwqs have been created successfully, let's install'em */
 	mutex_lock(&wq->mutex);
 
@@ -3736,7 +3710,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 		put_pwq_unlocked(pwq_tbl[node]);
 	put_pwq_unlocked(dfl_pwq);
 
-	put_online_cpus();
 	ret = 0;
 	/* fall through */
 out_free:
@@ -3750,14 +3723,52 @@ enomem_pwq:
 	for_each_node(node)
 		if (pwq_tbl && pwq_tbl[node] != dfl_pwq)
 			free_unbound_pwq(pwq_tbl[node]);
-	mutex_unlock(&wq_pool_mutex);
-	put_online_cpus();
 enomem:
 	ret = -ENOMEM;
 	goto out_free;
 }
 
 /**
+ * apply_workqueue_attrs - apply new workqueue_attrs to an unbound workqueue
+ * @wq: the target workqueue
+ * @attrs: the workqueue_attrs to apply, allocated with alloc_workqueue_attrs()
+ *
+ * Apply @attrs to an unbound workqueue @wq.  Unless disabled, on NUMA
+ * machines, this function maps a separate pwq to each NUMA node with
+ * possibles CPUs in @attrs->cpumask so that work items are affine to the
+ * NUMA node it was issued on.  Older pwqs are released as in-flight work
+ * items finish.  Note that a work item which repeatedly requeues itself
+ * back-to-back will stay on its current pwq.
+ *
+ * Performs GFP_KERNEL allocations.
+ *
+ * Return: 0 on success and -errno on failure.
+ */
+int apply_workqueue_attrs(struct workqueue_struct *wq,
+			  const struct workqueue_attrs *attrs)
+{
+	int ret;
+
+	/*
+	 * CPUs should stay stable across pwq creations and installations.
+	 * Pin CPUs, determine the target cpumask for each node and create
+	 * pwqs accordingly.
+	 */
+
+	get_online_cpus();
+	/*
+	 * Lock for alloc_unbound_pwq()
+	 */
+	mutex_lock(&wq_pool_mutex);
+	ret = apply_workqueue_attrs_locked(wq, attrs, wq_unbound_cpumask);
+	mutex_unlock(&wq_pool_mutex);
+	put_online_cpus();
+
+	return ret;
+}
+
+
+/**
  * wq_update_unbound_numa - update NUMA affinity of a wq for CPU hot[un]plug
  * @wq: the target workqueue
  * @cpu: the CPU coming up or going down
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask
  2014-05-16 16:16 [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2014-05-16 16:16 ` [PATCH 4/5] workqueue: Split apply attrs code from its locking Frederic Weisbecker
@ 2014-05-16 16:16 ` Frederic Weisbecker
  2014-05-16 20:50   ` Tejun Heo
  2014-07-11  8:41 ` [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Lai Jiangshan
  5 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-16 16:16 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Lai Jiangshan, Mike Galbraith, Paul E. McKenney, Tejun Heo,
	Viresh Kumar

Allow to modify the low-level unbound workqueues cpumask through
sysfs. This is performed by traversing the entire workqueue list
and calling apply_workqueue_attrs() on the unbound workqueues with
the low level mask passed in.

Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2aa296d..2a12f00 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -293,7 +293,7 @@ static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
 static LIST_HEAD(workqueues);		/* PL: list of all workqueues */
 static bool workqueue_freezing;		/* PL: have wqs started freezing? */
 
-static cpumask_var_t wq_unbound_cpumask;
+static cpumask_var_t wq_unbound_cpumask; /* PL: low level cpumask for all unbound wqs */
 
 /* the per-cpu worker pools */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
@@ -3643,6 +3643,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 {
 	struct workqueue_attrs *new_attrs, *tmp_attrs;
 	struct pool_workqueue **pwq_tbl, *dfl_pwq;
+	cpumask_var_t saved_cpumask;
 	int node, ret;
 
 	/* only unbound workqueues can change attributes */
@@ -3653,15 +3654,25 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 	if (WARN_ON((wq->flags & __WQ_ORDERED) && !attrs->no_numa))
 		return -EINVAL;
 
+	if (!alloc_cpumask_var(&saved_cpumask, GFP_KERNEL))
+		goto enomem;
+
 	pwq_tbl = kzalloc(wq_numa_tbl_len * sizeof(pwq_tbl[0]), GFP_KERNEL);
 	new_attrs = alloc_workqueue_attrs(GFP_KERNEL);
 	tmp_attrs = alloc_workqueue_attrs(GFP_KERNEL);
+
 	if (!pwq_tbl || !new_attrs || !tmp_attrs)
 		goto enomem;
 
 	/* make a copy of @attrs and sanitize it */
 	copy_workqueue_attrs(new_attrs, attrs);
-	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbounds_cpumask);
+
+	/*
+	 * Apply unbounds_cpumask on the new attrs for pwq and worker pools
+	 * creation but save the wq proper cpumask for unbound attrs backup.
+	 */
+	cpumask_and(saved_cpumask, new_attrs->cpumask, cpu_possible_mask);
+	cpumask_and(new_attrs->cpumask, saved_cpumask, unbounds_cpumask);
 
 	/*
 	 * We may create multiple pwqs with differing cpumasks.  Make a
@@ -3693,6 +3704,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 	/* all pwqs have been created successfully, let's install'em */
 	mutex_lock(&wq->mutex);
 
+	cpumask_copy(new_attrs->cpumask, saved_cpumask);
 	copy_workqueue_attrs(wq->unbound_attrs, new_attrs);
 
 	/* save the previous pwq and install the new one */
@@ -3713,6 +3725,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 	ret = 0;
 	/* fall through */
 out_free:
+	free_cpumask_var(saved_cpumask);
 	free_workqueue_attrs(tmp_attrs);
 	free_workqueue_attrs(new_attrs);
 	kfree(pwq_tbl);
@@ -3817,6 +3830,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 		goto out_unlock;
 
 	copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
+	cpumask_and(target_attrs->cpumask, target_attrs->cpumask, wq_unbound_cpumask);
 	pwq = unbound_pwq_by_node(wq, node);
 
 	/*
@@ -4084,19 +4098,80 @@ static struct bus_type wq_subsys = {
 	.dev_groups			= wq_sysfs_groups,
 };
 
+static int unbounds_cpumask_apply(cpumask_var_t cpumask)
+{
+	struct workqueue_struct *wq;
+	int ret;
+
+	lockdep_assert_held(&wq_pool_mutex);
+
+	list_for_each_entry(wq, &workqueues, list) {
+		struct workqueue_attrs *attrs;
+
+		if (!(wq->flags & WQ_UNBOUND))
+			continue;
+
+		attrs = wq_sysfs_prep_attrs(wq);
+		if (!attrs)
+			return -ENOMEM;
+
+		ret = apply_workqueue_attrs_locked(wq, attrs, cpumask);
+		free_workqueue_attrs(attrs);
+		if (ret)
+			break;
+	}
+
+	return 0;
+}
+
+static ssize_t unbounds_cpumask_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	cpumask_var_t cpumask;
+	int ret = -EINVAL;
+
+	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
+		return -ENOMEM;
+
+	ret = cpumask_parse(buf, cpumask);
+	if (ret)
+		goto out;
+
+	get_online_cpus();
+	if (cpumask_intersects(cpumask, cpu_online_mask)) {
+		mutex_lock(&wq_pool_mutex);
+		ret = unbounds_cpumask_apply(cpumask);
+		if (ret < 0) {
+			/* Warn if rollback itself fails */
+			WARN_ON_ONCE(unbounds_cpumask_apply(wq_unbound_cpumask));
+		} else {
+			cpumask_copy(wq_unbound_cpumask, cpumask);
+		}
+		mutex_unlock(&wq_pool_mutex);
+	}
+	put_online_cpus();
+out:
+	free_cpumask_var(cpumask);
+	return ret ? ret : count;
+}
+
 static ssize_t unbounds_cpumask_show(struct device *dev,
 				     struct device_attribute *attr, char *buf)
 {
 	int written;
 
+	mutex_lock(&wq_pool_mutex);
 	written = cpumask_scnprintf(buf, PAGE_SIZE, wq_unbound_cpumask);
+	mutex_unlock(&wq_pool_mutex);
+
 	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
 
 	return written;
 }
 
 static struct device_attribute wq_sysfs_cpumask_attr =
-	__ATTR(cpumask, 0444, unbounds_cpumask_show, NULL);
+	__ATTR(cpumask, 0644, unbounds_cpumask_show, unbounds_cpumask_store);
 
 static int __init wq_sysfs_init(void)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 16:16 ` [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask Frederic Weisbecker
@ 2014-05-16 17:52   ` Christoph Lameter
  2014-05-16 18:35     ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Lameter @ 2014-05-16 17:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Kevin Hilman, Lai Jiangshan, Mike Galbraith,
	Paul E. McKenney, Tejun Heo, Viresh Kumar

On Fri, 16 May 2014, Frederic Weisbecker wrote:

> It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
> such that the effective cpumask applied for a given unbound workqueue is
> the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
> the new /sys/devices/virtual/workqueue/cpumask_unbounds file.

Why is there "virtual" directory in the path? Workqueues are "virtual"
devices?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 17:52   ` Christoph Lameter
@ 2014-05-16 18:35     ` Tejun Heo
  2014-05-16 18:52       ` Christoph Lameter
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 18:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, May 16, 2014 at 12:52:26PM -0500, Christoph Lameter wrote:
> On Fri, 16 May 2014, Frederic Weisbecker wrote:
> 
> > It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
> > such that the effective cpumask applied for a given unbound workqueue is
> > the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
> > the new /sys/devices/virtual/workqueue/cpumask_unbounds file.
> 
> Why is there "virtual" directory in the path? Workqueues are "virtual"
> devices?

I ain't an actual one.  That apparently is the preferred place to
present software constructs like workqueues in sysfs.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 18:35     ` Tejun Heo
@ 2014-05-16 18:52       ` Christoph Lameter
  2014-05-16 19:00         ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Lameter @ 2014-05-16 18:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, 16 May 2014, Tejun Heo wrote:

> On Fri, May 16, 2014 at 12:52:26PM -0500, Christoph Lameter wrote:
> > On Fri, 16 May 2014, Frederic Weisbecker wrote:
> >
> > > It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
> > > such that the effective cpumask applied for a given unbound workqueue is
> > > the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
> > > the new /sys/devices/virtual/workqueue/cpumask_unbounds file.
> >
> > Why is there "virtual" directory in the path? Workqueues are "virtual"
> > devices?
>
> I ain't an actual one.  That apparently is the preferred place to
> present software constructs like workqueues in sysfs.

Could we fix that? A workqueue is not a device but more a kernel setting.

/sys/kernel/workqueue/.... ?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 18:52       ` Christoph Lameter
@ 2014-05-16 19:00         ` Tejun Heo
  2014-05-16 19:22           ` Tejun Heo
  2014-05-16 19:32           ` Christoph Lameter
  0 siblings, 2 replies; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 19:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, May 16, 2014 at 01:52:48PM -0500, Christoph Lameter wrote:
> On Fri, 16 May 2014, Tejun Heo wrote:
> 
> > On Fri, May 16, 2014 at 12:52:26PM -0500, Christoph Lameter wrote:
> > > On Fri, 16 May 2014, Frederic Weisbecker wrote:
> > >
> > > > It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
> > > > such that the effective cpumask applied for a given unbound workqueue is
> > > > the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
> > > > the new /sys/devices/virtual/workqueue/cpumask_unbounds file.
> > >
> > > Why is there "virtual" directory in the path? Workqueues are "virtual"
> > > devices?
> >
> > I ain't an actual one.  That apparently is the preferred place to
> > present software constructs like workqueues in sysfs.
> 
> Could we fix that? A workqueue is not a device but more a kernel setting.
> 
> /sys/kernel/workqueue/.... ?

Right, that could have been more in line with slab files.  It's
already too late tho.  This has been exposed for quite a while now.
Urgh...

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 19:00         ` Tejun Heo
@ 2014-05-16 19:22           ` Tejun Heo
  2014-05-16 19:32           ` Christoph Lameter
  1 sibling, 0 replies; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 19:22 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, May 16, 2014 at 03:00:42PM -0400, Tejun Heo wrote:
> > /sys/kernel/workqueue/.... ?
> 
> Right, that could have been more in line with slab files.  It's
> already too late tho.  This has been exposed for quite a while now.
> Urgh...

Okay, another difference, so things under /sys/devices generate
uevents and can thus be configured on boot or dynamically as the nodes
are created, which doesn't work for other /sys directories and
top-level directories take more boilerplate code.  Maybe the right
thing to do is moving /sys/kernel under /sys/devices and making
/sys/kernel a symlink?  It kinda sucks that the whole thing sits
outside the usual notification mechanism.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 19:00         ` Tejun Heo
  2014-05-16 19:22           ` Tejun Heo
@ 2014-05-16 19:32           ` Christoph Lameter
  2014-05-16 19:34             ` Tejun Heo
  1 sibling, 1 reply; 32+ messages in thread
From: Christoph Lameter @ 2014-05-16 19:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, 16 May 2014, Tejun Heo wrote:

> > Could we fix that? A workqueue is not a device but more a kernel setting.
> >
> > /sys/kernel/workqueue/.... ?
>
> Right, that could have been more in line with slab files.  It's
> already too late tho.  This has been exposed for quite a while now.
> Urgh...

It sets a bad precedent. So move to /sys/kernel/workqueue and lets have a
symlink that goes back?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 19:32           ` Christoph Lameter
@ 2014-05-16 19:34             ` Tejun Heo
  2014-05-16 19:45               ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 19:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, May 16, 2014 at 3:32 PM, Christoph Lameter <cl@linux.com> wrote:
> It sets a bad precedent. So move to /sys/kernel/workqueue and lets have a
> symlink that goes back?

Hmm... I don't think it's a good idea to lose uevent. It's an integral
part in configuring sysfs. Wouldn't it make more sense to move
/sys/kernel under /sys/devices?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 19:34             ` Tejun Heo
@ 2014-05-16 19:45               ` Tejun Heo
  2014-05-16 23:02                 ` Christoph Lameter
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 19:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello, again.

On Fri, May 16, 2014 at 3:34 PM, Tejun Heo <tj@kernel.org> wrote:
> On Fri, May 16, 2014 at 3:32 PM, Christoph Lameter <cl@linux.com> wrote:
>> It sets a bad precedent. So move to /sys/kernel/workqueue and lets have a
>> symlink that goes back?
>
> Hmm... I don't think it's a good idea to lose uevent. It's an integral
> part in configuring sysfs. Wouldn't it make more sense to move
> /sys/kernel under /sys/devices?

So, the thing is sysfs has been collecting everything under
/sys/devices because other top level directories added complexity
while missing out on basic event mechanism. If you look at other
top-level directories, other than /sys/modules and /sys/kernel,
everything else is symlink into /sys/devices hierarchy and just kept
around for compatibility. For static knobs, it may not matter but for
things like slab and workqueue which can be dynamically created and
destroyed, being hooked up into uevent mechanism is a necessity, so I
really think we better sort it out properly. Maybe it can use
/sys/devices/virtual or maybe we'll need a separate directory such
/sys/devices/kernel but I really don't find moving workqueue to
/sys/kernel at this point a good idea.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-16 16:16 ` [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues Frederic Weisbecker
@ 2014-05-16 20:12   ` Tejun Heo
  2014-05-17 13:41     ` Frederic Weisbecker
  2014-05-21  7:29   ` Lai Jiangshan
  1 sibling, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 20:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Lai Jiangshan, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello,

On Fri, May 16, 2014 at 06:16:51PM +0200, Frederic Weisbecker wrote:
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> Changing the attributions of a workqueue imply the addition of new pwqs
> to replace the old ones. But the current implementation doesn't handle
> ordered workqueues because they can't carry multi-pwqs without breaking
> ordering. Hence ordered workqueues currently aren't allowed to call
> apply_workqueue_attrs().
...
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> Cc: Mike Galbraith <bitbucket@online.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>

Do you mind touching up the description and comment a bit as it's
going through you?  They have gotten a lot better (kudos to Lai :) but
I still feel the need to touch them up a bit before applying.  I'd
really appreciate if you can do it as part of your series.

> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c3f076f..c68e84f 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1355,8 +1355,16 @@ retry:
>  	 * If @work was previously on a different pool, it might still be
>  	 * running there, in which case the work needs to be queued on that
>  	 * pool to guarantee non-reentrancy.
> +	 *
> +	 * We guarantee that only one pwq is active on an ordered workqueue.
> +	 * That alone enforces non-reentrancy for works. So ordered works don't

Let's use "work items" instead of "works".

> +	 * need to be requeued to their previous pool. Not to mention that
> +	 * an ordered work requeing itself over and over on the same pool may
> +	 * prevent a pwq from being released in case of a pool switch. The
> +	 * newest pool in that case couldn't switch in and its pending works
> +	 * would starve.
>  	 */
> -	last_pool = get_work_pool(work);
> +	last_pool = wq->flags & __WQ_ORDERED ? NULL : get_work_pool(work);
>  	if (last_pool && last_pool != pwq->pool) {
>  		struct worker *worker;

I'm not a big fan of the fact that ordered queues need to be handled
differently when queueing, but as the code is currently written, this
is pretty much necessary to maintain execution order, right?
Otherwise, you end up with requeueing work items targeting the pwq it
was executing on and new ones targeting the newest one screwing up the
ordering.  I think that'd be a lot more important to note in the
comment.  This is a correctness measure.  Back-to-back requeueing
being affected by this is just a side-effect.

Also, can you please use plain if/else instead of "?:"?  This isn't a
simple logic and I don't think compressing it with ?: is a good idea.

> @@ -3708,6 +3712,13 @@ static void rcu_free_pwq(struct rcu_head *rcu)
>  			container_of(rcu, struct pool_workqueue, rcu));
>  }
>  
> +static struct pool_workqueue *oldest_pwq(struct workqueue_struct *wq)
> +{
> +	return list_last_entry(&wq->pwqs, struct pool_workqueue, pwqs_node);
> +}
> +
> +static void pwq_adjust_max_active(struct pool_workqueue *pwq);
> +
>  /*
>   * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt
>   * and needs to be destroyed.
> @@ -3723,14 +3734,12 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
>  	if (WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)))
>  		return;
>  
> -	/*
> -	 * Unlink @pwq.  Synchronization against wq->mutex isn't strictly
> -	 * necessary on release but do it anyway.  It's easier to verify
> -	 * and consistent with the linking path.
> -	 */
>  	mutex_lock(&wq->mutex);
>  	list_del_rcu(&pwq->pwqs_node);
>  	is_last = list_empty(&wq->pwqs);
> +	/* try to activate the oldest pwq when needed */
> +	if (!is_last && (wq->flags & __WQ_ORDERED))

Why bother with @is_last when it's used only once and the test is
trivial?  Is the test even necessary?  Invoking
pwq_adjust_max_active() directly should work, no?  Also, this needs
whole lot more comment explaining what's going on.

> +		pwq_adjust_max_active(oldest_pwq(wq));
>  	mutex_unlock(&wq->mutex);
>  
>  	mutex_lock(&wq_pool_mutex);
> @@ -3749,6 +3758,16 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
>  	}
>  }
>  
> +static bool pwq_active(struct pool_workqueue *pwq)
> +{
> +	/* Only the oldest pwq is active in the ordered wq */
> +	if (pwq->wq->flags & __WQ_ORDERED)
> +		return pwq == oldest_pwq(pwq->wq);
> +
> +	/* All pwqs in the non-ordered wq are active */
> +	return true;
> +}

Just collapse it into the calling function.  This obfuscates more than
helps.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask
  2014-05-16 16:16 ` [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask Frederic Weisbecker
@ 2014-05-16 20:50   ` Tejun Heo
  2014-05-20 19:32     ` Frederic Weisbecker
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 20:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christoph Lameter, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello, Frederic.

On Fri, May 16, 2014 at 06:16:55PM +0200, Frederic Weisbecker wrote:
> @@ -3643,6 +3643,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
>  {
>  	struct workqueue_attrs *new_attrs, *tmp_attrs;
>  	struct pool_workqueue **pwq_tbl, *dfl_pwq;
> +	cpumask_var_t saved_cpumask;
>  	int node, ret;
>  
>  	/* only unbound workqueues can change attributes */
> @@ -3653,15 +3654,25 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
>  	if (WARN_ON((wq->flags & __WQ_ORDERED) && !attrs->no_numa))
>  		return -EINVAL;
>  
> +	if (!alloc_cpumask_var(&saved_cpumask, GFP_KERNEL))
> +		goto enomem;
> +
>  	pwq_tbl = kzalloc(wq_numa_tbl_len * sizeof(pwq_tbl[0]), GFP_KERNEL);
>  	new_attrs = alloc_workqueue_attrs(GFP_KERNEL);
>  	tmp_attrs = alloc_workqueue_attrs(GFP_KERNEL);
> +
>  	if (!pwq_tbl || !new_attrs || !tmp_attrs)
>  		goto enomem;
>  
>  	/* make a copy of @attrs and sanitize it */
>  	copy_workqueue_attrs(new_attrs, attrs);
> -	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbounds_cpumask);
> +
> +	/*
> +	 * Apply unbounds_cpumask on the new attrs for pwq and worker pools
> +	 * creation but save the wq proper cpumask for unbound attrs backup.
> +	 */
> +	cpumask_and(saved_cpumask, new_attrs->cpumask, cpu_possible_mask);
> +	cpumask_and(new_attrs->cpumask, saved_cpumask, unbounds_cpumask);
>  
>  	/*
>  	 * We may create multiple pwqs with differing cpumasks.  Make a
> @@ -3693,6 +3704,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
>  	/* all pwqs have been created successfully, let's install'em */
>  	mutex_lock(&wq->mutex);
>  
> +	cpumask_copy(new_attrs->cpumask, saved_cpumask);
>  	copy_workqueue_attrs(wq->unbound_attrs, new_attrs);

Yeah, this seems like the correct behavior but it's a bit nasty.
Wouldn't creating another application copy be cleaner?  If not, can we
at least add more comment explaining why we're doing this?
Hmmmm... shouldn't we be able to apply the mask to tmp_attrs?

Also, isn't the code block involving wq_calc_node_cpumask() kinda
broken for this?  It uses @attrs which is not masked by
@unbounds_cpumask.  This used to be okay as whatever it calculates
would fall in @cpu_possible_mask anyway but that no longer is the
case, right?

Another one, why is @unbounds_cpumask passed in as an argument?  Can't
it use the global variable directly?

> +static int unbounds_cpumask_apply(cpumask_var_t cpumask)
> +{
> +	struct workqueue_struct *wq;
> +	int ret;
> +
> +	lockdep_assert_held(&wq_pool_mutex);
> +
> +	list_for_each_entry(wq, &workqueues, list) {
> +		struct workqueue_attrs *attrs;
> +
> +		if (!(wq->flags & WQ_UNBOUND))
> +			continue;
> +
> +		attrs = wq_sysfs_prep_attrs(wq);
> +		if (!attrs)
> +			return -ENOMEM;
> +
> +		ret = apply_workqueue_attrs_locked(wq, attrs, cpumask);
> +		free_workqueue_attrs(attrs);
> +		if (ret)
> +			break;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t unbounds_cpumask_store(struct device *dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t count)
> +{
> +	cpumask_var_t cpumask;
> +	int ret = -EINVAL;
> +
> +	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	ret = cpumask_parse(buf, cpumask);
> +	if (ret)
> +		goto out;
> +
> +	get_online_cpus();
> +	if (cpumask_intersects(cpumask, cpu_online_mask)) {
> +		mutex_lock(&wq_pool_mutex);
> +		ret = unbounds_cpumask_apply(cpumask);
> +		if (ret < 0) {
> +			/* Warn if rollback itself fails */
> +			WARN_ON_ONCE(unbounds_cpumask_apply(wq_unbound_cpumask));

Hmmm... but there's nothing which makes rolling back more likely to
succeed compared to the original applications.  It's gonna allocate
more pwqs.  Triggering WARN_ON_ONCE() seems weird.

That said, yeah, short of splitting apply_workqueue_attrs_locked()
into two stages - alloc and commit, I don't think proper error
recovery is possible.

There are a couple options, I guess.

1. Split apply_workqueue_attrs_locked() into two stages.  The first
   stage creates new pwqs as necessary and cache it.  Each @wq would
   need a pointer to remember these staged pwq_tbl.  The second stage
   commits them, which obviously can't fail.

2. Proper error handling is hard.  Just do pr_warn() on each failure
   and continue to try to apply and always return 0.

If #1 isn't too complicated (would it be?), it'd be the better option;
otherwise, well, #2 should work most of the time, eh?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 19:45               ` Tejun Heo
@ 2014-05-16 23:02                 ` Christoph Lameter
  2014-05-16 23:48                   ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Lameter @ 2014-05-16 23:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, 16 May 2014, Tejun Heo wrote:

>
> So, the thing is sysfs has been collecting everything under
> /sys/devices because other top level directories added complexity
> while missing out on basic event mechanism. If you look at other

Make the uevent stuff work on all of sysfs?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 23:02                 ` Christoph Lameter
@ 2014-05-16 23:48                   ` Tejun Heo
  2014-05-17 22:45                     ` Christoph Lameter
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-16 23:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello,

On Fri, May 16, 2014 at 06:02:55PM -0500, Christoph Lameter wrote:
> On Fri, 16 May 2014, Tejun Heo wrote:
> > So, the thing is sysfs has been collecting everything under
> > /sys/devices because other top level directories added complexity
> > while missing out on basic event mechanism. If you look at other
> 
> Make the uevent stuff work on all of sysfs?

I don't know.  Maybe.  Likely a lot more work involving user space
changes tho.  Also, it runs contrary to what we've been doing to other
top-level cgroup directories.  Any reason not to do, say,
/sys/devices/kernel?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-16 20:12   ` Tejun Heo
@ 2014-05-17 13:41     ` Frederic Weisbecker
  2014-05-19 20:15       ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-17 13:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Lai Jiangshan, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, May 16, 2014 at 04:12:25PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Fri, May 16, 2014 at 06:16:51PM +0200, Frederic Weisbecker wrote:
> > From: Lai Jiangshan <laijs@cn.fujitsu.com>
> > 
> > Changing the attributions of a workqueue imply the addition of new pwqs
> > to replace the old ones. But the current implementation doesn't handle
> > ordered workqueues because they can't carry multi-pwqs without breaking
> > ordering. Hence ordered workqueues currently aren't allowed to call
> > apply_workqueue_attrs().
> ...
> > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Kevin Hilman <khilman@linaro.org>
> > Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Cc: Mike Galbraith <bitbucket@online.de>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> 
> Do you mind touching up the description and comment a bit as it's
> going through you?  They have gotten a lot better (kudos to Lai :) but
> I still feel the need to touch them up a bit before applying.  I'd
> really appreciate if you can do it as part of your series.

Sure, I mean I actually seldom apply other's patches as is. I actually edited,
reworded and reorganized this changelog a lot. Same for some comments.

But since I was still not sure that the direction was the final one, I probably
left some parts with light explanations.

I'll have a second pass, just don't hesitate to point me out comments or anything
that need improvement.

> 
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index c3f076f..c68e84f 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -1355,8 +1355,16 @@ retry:
> >  	 * If @work was previously on a different pool, it might still be
> >  	 * running there, in which case the work needs to be queued on that
> >  	 * pool to guarantee non-reentrancy.
> > +	 *
> > +	 * We guarantee that only one pwq is active on an ordered workqueue.
> > +	 * That alone enforces non-reentrancy for works. So ordered works don't
> 
> Let's use "work items" instead of "works".

Ok.

> 
> > +	 * need to be requeued to their previous pool. Not to mention that
> > +	 * an ordered work requeing itself over and over on the same pool may
> > +	 * prevent a pwq from being released in case of a pool switch. The
> > +	 * newest pool in that case couldn't switch in and its pending works
> > +	 * would starve.
> >  	 */
> > -	last_pool = get_work_pool(work);
> > +	last_pool = wq->flags & __WQ_ORDERED ? NULL : get_work_pool(work);
> >  	if (last_pool && last_pool != pwq->pool) {
> >  		struct worker *worker;
> 
> I'm not a big fan of the fact that ordered queues need to be handled
> differently when queueing, but as the code is currently written, this
> is pretty much necessary to maintain execution order, right?
>
> Otherwise, you end up with requeueing work items targeting the pwq it
> was executing on and new ones targeting the newest one screwing up the
> ordering.  I think that'd be a lot more important to note in the
> comment.  This is a correctness measure.  Back-to-back requeueing
> being affected by this is just a side-effect.

In the case of ordered workqueues it actually doesn't matter much in
term of ordering. But it's needed when pwqs are replaced (as it happens
in apply_workqueue_attrs(). We must make sure works requeueing themselves
don't always requeue to the old pwq otherwise it will never be able to
switch and be released. Also the next work items will be queued on the next
pwq but this one will never be able to run due to the old workqueue still
being used by the item requeing itself. So we also risk starvation on the
new workqueue.

But the ordering itself is actually fine for ordered workqueue. It's actually
enforced by the fact that only one pwq can run at a time for a given workqueue.

OTOH non-ordered workqueues still depend on the above machinery to ensure
ordering at the work item scope. Which is what we rather call re-entrancy.

So I think the comment is not misleading. But it may deserve to be rephrased.

> 
> Also, can you please use plain if/else instead of "?:"?  This isn't a
> simple logic and I don't think compressing it with ?: is a good idea.

Agreed, will do.

> 
> > @@ -3708,6 +3712,13 @@ static void rcu_free_pwq(struct rcu_head *rcu)
> >  			container_of(rcu, struct pool_workqueue, rcu));
> >  }
> >  
> > +static struct pool_workqueue *oldest_pwq(struct workqueue_struct *wq)
> > +{
> > +	return list_last_entry(&wq->pwqs, struct pool_workqueue, pwqs_node);
> > +}
> > +
> > +static void pwq_adjust_max_active(struct pool_workqueue *pwq);
> > +
> >  /*
> >   * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt
> >   * and needs to be destroyed.
> > @@ -3723,14 +3734,12 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
> >  	if (WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)))
> >  		return;
> >  
> > -	/*
> > -	 * Unlink @pwq.  Synchronization against wq->mutex isn't strictly
> > -	 * necessary on release but do it anyway.  It's easier to verify
> > -	 * and consistent with the linking path.
> > -	 */
> >  	mutex_lock(&wq->mutex);
> >  	list_del_rcu(&pwq->pwqs_node);
> >  	is_last = list_empty(&wq->pwqs);
> > +	/* try to activate the oldest pwq when needed */
> > +	if (!is_last && (wq->flags & __WQ_ORDERED))
> 
> Why bother with @is_last when it's used only once and the test is
> trivial?  Is the test even necessary?  Invoking
> pwq_adjust_max_active() directly should work, no?

It should work with is_last(). Just adjusting max_active if the
whole wq is going to be released sounds couter-intuitive. So
I find the if (!is_last && ordered) helps the code to be self-explained.

> Also, this needs
> whole lot more comment explaining what's going on.

Agreed, I'll add some details.

> 
> > +		pwq_adjust_max_active(oldest_pwq(wq));
> >  	mutex_unlock(&wq->mutex);
> >  
> >  	mutex_lock(&wq_pool_mutex);
> > @@ -3749,6 +3758,16 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
> >  	}
> >  }
> >  
> > +static bool pwq_active(struct pool_workqueue *pwq)
> > +{
> > +	/* Only the oldest pwq is active in the ordered wq */
> > +	if (pwq->wq->flags & __WQ_ORDERED)
> > +		return pwq == oldest_pwq(pwq->wq);
> > +
> > +	/* All pwqs in the non-ordered wq are active */
> > +	return true;
> > +}
> 
> Just collapse it into the calling function.  This obfuscates more than
> helps.

Yeah but the condition is already big. Lets hope the result won't be too ugly.

> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-16 23:48                   ` Tejun Heo
@ 2014-05-17 22:45                     ` Christoph Lameter
  2014-05-18  2:51                       ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Lameter @ 2014-05-17 22:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, 16 May 2014, Tejun Heo wrote:

> > Make the uevent stuff work on all of sysfs?
>
> I don't know.  Maybe.  Likely a lot more work involving user space
> changes tho.  Also, it runs contrary to what we've been doing to other
> top-level cgroup directories.  Any reason not to do, say,
> /sys/devices/kernel?

The other kernel tunables are in /sys/kernel.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask
  2014-05-17 22:45                     ` Christoph Lameter
@ 2014-05-18  2:51                       ` Tejun Heo
  0 siblings, 0 replies; 32+ messages in thread
From: Tejun Heo @ 2014-05-18  2:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Frederic Weisbecker, LKML, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello,

On Sat, May 17, 2014 at 05:45:40PM -0500, Christoph Lameter wrote:
> On Fri, 16 May 2014, Tejun Heo wrote:
> 
> > > Make the uevent stuff work on all of sysfs?
> >
> > I don't know.  Maybe.  Likely a lot more work involving user space
> > changes tho.  Also, it runs contrary to what we've been doing to other
> > top-level cgroup directories.  Any reason not to do, say,
> > /sys/devices/kernel?
> 
> The other kernel tunables are in /sys/kernel.

Heh, we're talking past each other.  I meant moving everything under
/sys/kernel to /sys/devices/kernel and making /sys/kernel a symlink to
/sys/devices/kernel.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-17 13:41     ` Frederic Weisbecker
@ 2014-05-19 20:15       ` Tejun Heo
  2014-05-20 14:32         ` Frederic Weisbecker
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-19 20:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Lai Jiangshan, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello,

On Sat, May 17, 2014 at 03:41:55PM +0200, Frederic Weisbecker wrote:
> > > -	last_pool = get_work_pool(work);
> > > +	last_pool = wq->flags & __WQ_ORDERED ? NULL : get_work_pool(work);
> > >  	if (last_pool && last_pool != pwq->pool) {
> > >  		struct worker *worker;
> > 
> > I'm not a big fan of the fact that ordered queues need to be handled
> > differently when queueing, but as the code is currently written, this
> > is pretty much necessary to maintain execution order, right?
> >
> > Otherwise, you end up with requeueing work items targeting the pwq it
> > was executing on and new ones targeting the newest one screwing up the
> > ordering.  I think that'd be a lot more important to note in the
> > comment.  This is a correctness measure.  Back-to-back requeueing
> > being affected by this is just a side-effect.
> 
> In the case of ordered workqueues it actually doesn't matter much in
> term of ordering. But it's needed when pwqs are replaced (as it happens
> in apply_workqueue_attrs(). We must make sure works requeueing themselves
> don't always requeue to the old pwq otherwise it will never be able to
> switch and be released. Also the next work items will be queued on the next

But that's the same for other pwqs too.  Back-to-back requeueing will
hold back pwq switching on any workqueue.

> pwq but this one will never be able to run due to the old workqueue still
> being used by the item requeing itself. So we also risk starvation on the
> new workqueue.
> 
> But the ordering itself is actually fine for ordered workqueue. It's actually
> enforced by the fact that only one pwq can run at a time for a given workqueue.

Maybe I'm confused but I don't think it'd be.  Let's say there was an
attribute change with one work item, A, which is performing
back-to-back requeueing and another one, B, which queues itself
intermittently.  If B is queued while A is executing, followed by A
requeueing itself, the expected execution order is A - B - A; however,
without the above exception for ordered workqueues, it'd end up A - A
- B because B will end up on the new pwq while A on the older one and
max_active won't be transferred to the new pwq before it becomes
empty.

> > Just collapse it into the calling function.  This obfuscates more than
> > helps.
> 
> Yeah but the condition is already big. Lets hope the result won't be too ugly.

I didn't mean that the condition should be encoded in the if
conditional.  It's fine to break it out using a separate variable or
whatever.  I just don't think breaking it out to a separate function
is helping anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-19 20:15       ` Tejun Heo
@ 2014-05-20 14:32         ` Frederic Weisbecker
  2014-05-20 14:35           ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-20 14:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Lai Jiangshan, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Mon, May 19, 2014 at 04:15:31PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Sat, May 17, 2014 at 03:41:55PM +0200, Frederic Weisbecker wrote:
> > > > -	last_pool = get_work_pool(work);
> > > > +	last_pool = wq->flags & __WQ_ORDERED ? NULL : get_work_pool(work);
> > > >  	if (last_pool && last_pool != pwq->pool) {
> > > >  		struct worker *worker;
> > > 
> > > I'm not a big fan of the fact that ordered queues need to be handled
> > > differently when queueing, but as the code is currently written, this
> > > is pretty much necessary to maintain execution order, right?
> > >
> > > Otherwise, you end up with requeueing work items targeting the pwq it
> > > was executing on and new ones targeting the newest one screwing up the
> > > ordering.  I think that'd be a lot more important to note in the
> > > comment.  This is a correctness measure.  Back-to-back requeueing
> > > being affected by this is just a side-effect.
> > 
> > In the case of ordered workqueues it actually doesn't matter much in
> > term of ordering. But it's needed when pwqs are replaced (as it happens
> > in apply_workqueue_attrs(). We must make sure works requeueing themselves
> > don't always requeue to the old pwq otherwise it will never be able to
> > switch and be released. Also the next work items will be queued on the next
> 
> But that's the same for other pwqs too.  Back-to-back requeueing will
> hold back pwq switching on any workqueue.

I don't think so, because non ordered pwqs aren't created with 0 max_active,
so they can run before the old pwq is released. It's not holding back the new
one and creating a starvation there.

But maybe I forget other details.

> 
> > pwq but this one will never be able to run due to the old workqueue still
> > being used by the item requeing itself. So we also risk starvation on the
> > new workqueue.
> > 
> > But the ordering itself is actually fine for ordered workqueue. It's actually
> > enforced by the fact that only one pwq can run at a time for a given workqueue.
> 
> Maybe I'm confused but I don't think it'd be.  Let's say there was an
> attribute change with one work item, A, which is performing
> back-to-back requeueing and another one, B, which queues itself
> intermittently.  If B is queued while A is executing, followed by A
> requeueing itself, the expected execution order is A - B - A; however,
> without the above exception for ordered workqueues, it'd end up A - A
> - B because B will end up on the new pwq while A on the older one and
> max_active won't be transferred to the new pwq before it becomes
> empty.

Ah right AAB instead of ABA is possible indeed. I don't know if some workqueue
rely on such guarantee but it's possible.

In which case we have one more reason to make an exception on ordered workqueues
previous pwq reuse.

> 
> > > Just collapse it into the calling function.  This obfuscates more than
> > > helps.
> > 
> > Yeah but the condition is already big. Lets hope the result won't be too ugly.
> 
> I didn't mean that the condition should be encoded in the if
> conditional.  It's fine to break it out using a separate variable or
> whatever.  I just don't think breaking it out to a separate function
> is helping anything.

Alright.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-20 14:32         ` Frederic Weisbecker
@ 2014-05-20 14:35           ` Tejun Heo
  2014-05-20 15:08             ` Frederic Weisbecker
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-20 14:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Lai Jiangshan, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello,

On Tue, May 20, 2014 at 04:32:31PM +0200, Frederic Weisbecker wrote:
> > But that's the same for other pwqs too.  Back-to-back requeueing will
> > hold back pwq switching on any workqueue.
> 
> I don't think so, because non ordered pwqs aren't created with 0 max_active,
> so they can run before the old pwq is released. It's not holding back the new
> one and creating a starvation there.
> 
> But maybe I forget other details.

Ah, I was thinking about old pwq not being allowed to be released
while one or more work items are requeueing themselves back-to-back.
Yeap, the new ones can still be used for other work items.

> > Maybe I'm confused but I don't think it'd be.  Let's say there was an
> > attribute change with one work item, A, which is performing
> > back-to-back requeueing and another one, B, which queues itself
> > intermittently.  If B is queued while A is executing, followed by A
> > requeueing itself, the expected execution order is A - B - A; however,
> > without the above exception for ordered workqueues, it'd end up A - A
> > - B because B will end up on the new pwq while A on the older one and
> > max_active won't be transferred to the new pwq before it becomes
> > empty.
> 
> Ah right AAB instead of ABA is possible indeed. I don't know if some workqueue
> rely on such guarantee but it's possible.

That's part of the ordering guarantee of ordered workqueues so we
better not break it.

> In which case we have one more reason to make an exception on ordered workqueues
> previous pwq reuse.

Yeah, I agree the special treatment is necessary but the current
comment is misleading.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-20 14:35           ` Tejun Heo
@ 2014-05-20 15:08             ` Frederic Weisbecker
  0 siblings, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-20 15:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Lai Jiangshan, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Tue, May 20, 2014 at 10:35:34AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, May 20, 2014 at 04:32:31PM +0200, Frederic Weisbecker wrote:
> > > But that's the same for other pwqs too.  Back-to-back requeueing will
> > > hold back pwq switching on any workqueue.
> > 
> > I don't think so, because non ordered pwqs aren't created with 0 max_active,
> > so they can run before the old pwq is released. It's not holding back the new
> > one and creating a starvation there.
> > 
> > But maybe I forget other details.
> 
> Ah, I was thinking about old pwq not being allowed to be released
> while one or more work items are requeueing themselves back-to-back.

Right. OTOH, if you have non-deffered work items requeuing themselves
back to back for ever, you may have a much bigger problem than just a few
unreleased bytes :)

> Yeap, the new ones can still be used for other work items.
> 
> > > Maybe I'm confused but I don't think it'd be.  Let's say there was an
> > > attribute change with one work item, A, which is performing
> > > back-to-back requeueing and another one, B, which queues itself
> > > intermittently.  If B is queued while A is executing, followed by A
> > > requeueing itself, the expected execution order is A - B - A; however,
> > > without the above exception for ordered workqueues, it'd end up A - A
> > > - B because B will end up on the new pwq while A on the older one and
> > > max_active won't be transferred to the new pwq before it becomes
> > > empty.
> > 
> > Ah right AAB instead of ABA is possible indeed. I don't know if some workqueue
> > rely on such guarantee but it's possible.
> 
> That's part of the ordering guarantee of ordered workqueues so we
> better not break it.

Ok.

> 
> > In which case we have one more reason to make an exception on ordered workqueues
> > previous pwq reuse.
> 
> Yeah, I agree the special treatment is necessary but the current
> comment is misleading.

Ok got it, I'll try to improve the comment.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask
  2014-05-16 20:50   ` Tejun Heo
@ 2014-05-20 19:32     ` Frederic Weisbecker
  2014-05-20 19:56       ` Tejun Heo
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-20 19:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Christoph Lameter, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Fri, May 16, 2014 at 04:50:50PM -0400, Tejun Heo wrote:
> Hello, Frederic.
> 
> On Fri, May 16, 2014 at 06:16:55PM +0200, Frederic Weisbecker wrote:
> > @@ -3643,6 +3643,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
> >  {
> >  	struct workqueue_attrs *new_attrs, *tmp_attrs;
> >  	struct pool_workqueue **pwq_tbl, *dfl_pwq;
> > +	cpumask_var_t saved_cpumask;
> >  	int node, ret;
> >  
> >  	/* only unbound workqueues can change attributes */
> > @@ -3653,15 +3654,25 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
> >  	if (WARN_ON((wq->flags & __WQ_ORDERED) && !attrs->no_numa))
> >  		return -EINVAL;
> >  
> > +	if (!alloc_cpumask_var(&saved_cpumask, GFP_KERNEL))
> > +		goto enomem;
> > +
> >  	pwq_tbl = kzalloc(wq_numa_tbl_len * sizeof(pwq_tbl[0]), GFP_KERNEL);
> >  	new_attrs = alloc_workqueue_attrs(GFP_KERNEL);
> >  	tmp_attrs = alloc_workqueue_attrs(GFP_KERNEL);
> > +
> >  	if (!pwq_tbl || !new_attrs || !tmp_attrs)
> >  		goto enomem;
> >  
> >  	/* make a copy of @attrs and sanitize it */
> >  	copy_workqueue_attrs(new_attrs, attrs);
> > -	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbounds_cpumask);
> > +
> > +	/*
> > +	 * Apply unbounds_cpumask on the new attrs for pwq and worker pools
> > +	 * creation but save the wq proper cpumask for unbound attrs backup.
> > +	 */
> > +	cpumask_and(saved_cpumask, new_attrs->cpumask, cpu_possible_mask);
> > +	cpumask_and(new_attrs->cpumask, saved_cpumask, unbounds_cpumask);
> >  
> >  	/*
> >  	 * We may create multiple pwqs with differing cpumasks.  Make a
> > @@ -3693,6 +3704,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
> >  	/* all pwqs have been created successfully, let's install'em */
> >  	mutex_lock(&wq->mutex);
> >  
> > +	cpumask_copy(new_attrs->cpumask, saved_cpumask);
> >  	copy_workqueue_attrs(wq->unbound_attrs, new_attrs);
> 
> Yeah, this seems like the correct behavior but it's a bit nasty.
> Wouldn't creating another application copy be cleaner?

I'm not sure, you mean an intermediate call to apply_workqueue_attrs()?
But still that wouldn't solve it.

Now if you mean a new intermediate attrs instead of a plain cpumask, then
yeah that works.

> If not, can we
> at least add more comment explaining why we're doing this?

Sure.

> Hmmmm... shouldn't we be able to apply the mask to tmp_attrs?

Ah maybe after all. We just need to create the dfl with the tmp attrs.

> Also, isn't the code block involving wq_calc_node_cpumask() kinda
> broken for this?  It uses @attrs which is not masked by
> @unbounds_cpumask.  This used to be okay as whatever it calculates
> would fall in @cpu_possible_mask anyway but that no longer is the
> case, right?

Good catch. I was somehow convinced it was using new_attrs there.
But then we must pass attrs that are and'ed against unbounds_cpumask.
tmp_attrs can't be used anymore for that so we must do the and on new_attrs.

So we still need the temporary cpumask (or attrs).

> 
> Another one, why is @unbounds_cpumask passed in as an argument?  Can't
> it use the global variable directly?

That's needed for the rolling back. The new mask is only copied if
we succeed on attrs application.

We can use a global variable but then we need to store the old cpumask,
when we apply a new one, to be able to rollback on failure.

> 
> > +static int unbounds_cpumask_apply(cpumask_var_t cpumask)
> > +{
> > +	struct workqueue_struct *wq;
> > +	int ret;
> > +
> > +	lockdep_assert_held(&wq_pool_mutex);
> > +
> > +	list_for_each_entry(wq, &workqueues, list) {
> > +		struct workqueue_attrs *attrs;
> > +
> > +		if (!(wq->flags & WQ_UNBOUND))
> > +			continue;
> > +
> > +		attrs = wq_sysfs_prep_attrs(wq);
> > +		if (!attrs)
> > +			return -ENOMEM;
> > +
> > +		ret = apply_workqueue_attrs_locked(wq, attrs, cpumask);
> > +		free_workqueue_attrs(attrs);
> > +		if (ret)
> > +			break;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static ssize_t unbounds_cpumask_store(struct device *dev,
> > +				      struct device_attribute *attr,
> > +				      const char *buf, size_t count)
> > +{
> > +	cpumask_var_t cpumask;
> > +	int ret = -EINVAL;
> > +
> > +	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> > +	ret = cpumask_parse(buf, cpumask);
> > +	if (ret)
> > +		goto out;
> > +
> > +	get_online_cpus();
> > +	if (cpumask_intersects(cpumask, cpu_online_mask)) {
> > +		mutex_lock(&wq_pool_mutex);
> > +		ret = unbounds_cpumask_apply(cpumask);
> > +		if (ret < 0) {
> > +			/* Warn if rollback itself fails */
> > +			WARN_ON_ONCE(unbounds_cpumask_apply(wq_unbound_cpumask));
> 
> Hmmm... but there's nothing which makes rolling back more likely to
> succeed compared to the original applications.  It's gonna allocate
> more pwqs.  Triggering WARN_ON_ONCE() seems weird.

Yeah but that's the least we can do. If we fail to even recover the old cpumask,
the user should know about the half state fail.

> 
> That said, yeah, short of splitting apply_workqueue_attrs_locked()
> into two stages - alloc and commit, I don't think proper error
> recovery is possible.
> 
> There are a couple options, I guess.
> 
> 1. Split apply_workqueue_attrs_locked() into two stages.  The first
>    stage creates new pwqs as necessary and cache it.  Each @wq would
>    need a pointer to remember these staged pwq_tbl.  The second stage
>    commits them, which obviously can't fail.

That's already what it does. The problem is not that apply_workqueue_attrs()
fails in he middle. It's rather when we succedded to apply the new cpumask
to some workqueues, but failed with some. Then if we also fail on rollback,
we end up with some workqueue affine to the new cpumask and some others
affine to the old one.

So the only way to enforce either the new or the old affinity is to do
what you suggest but for the whole list of workqueues. So we need to allocate
first all pwqs and then apply all of them.

But it's going to imply fun with double linked list of struct pwq_allocation_object
and stuff. Or maybe an array. This reminds be a bit generate_sched_domains(). It's
not going to be _that_ simple nor pretty :)

> 
> 2. Proper error handling is hard.  Just do pr_warn() on each failure
>    and continue to try to apply and always return 0.
> 
> If #1 isn't too complicated (would it be?), it'd be the better option;
> otherwise, well, #2 should work most of the time, eh?

Yeah I think #2 should be way enough 99% of the time :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask
  2014-05-20 19:32     ` Frederic Weisbecker
@ 2014-05-20 19:56       ` Tejun Heo
  2014-05-20 20:08         ` Frederic Weisbecker
  0 siblings, 1 reply; 32+ messages in thread
From: Tejun Heo @ 2014-05-20 19:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christoph Lameter, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hello,

On Tue, May 20, 2014 at 09:32:55PM +0200, Frederic Weisbecker wrote:
> > > +	cpumask_copy(new_attrs->cpumask, saved_cpumask);
> > >  	copy_workqueue_attrs(wq->unbound_attrs, new_attrs);
> > 
> > Yeah, this seems like the correct behavior but it's a bit nasty.
> > Wouldn't creating another application copy be cleaner?
> 
> I'm not sure, you mean an intermediate call to apply_workqueue_attrs()?
> But still that wouldn't solve it.

Oh I meant making anohter copy of attrs for application so that we
don't have to modify @new_attrs and then revert it.

> Now if you mean a new intermediate attrs instead of a plain cpumask, then
> yeah that works.

Yeap.

> > If not, can we
> > at least add more comment explaining why we're doing this?
> 
> Sure.
> 
> > Hmmmm... shouldn't we be able to apply the mask to tmp_attrs?
> 
> Ah maybe after all. We just need to create the dfl with the tmp attrs.
> 
> > Also, isn't the code block involving wq_calc_node_cpumask() kinda
> > broken for this?  It uses @attrs which is not masked by
> > @unbounds_cpumask.  This used to be okay as whatever it calculates
> > would fall in @cpu_possible_mask anyway but that no longer is the
> > case, right?
> 
> Good catch. I was somehow convinced it was using new_attrs there.
> But then we must pass attrs that are and'ed against unbounds_cpumask.
> tmp_attrs can't be used anymore for that so we must do the and on new_attrs.
> 
> So we still need the temporary cpumask (or attrs).
> 
> > Another one, why is @unbounds_cpumask passed in as an argument?  Can't
> > it use the global variable directly?
> 
> That's needed for the rolling back. The new mask is only copied if
> we succeed on attrs application.
> 
> We can use a global variable but then we need to store the old cpumask,
> when we apply a new one, to be able to rollback on failure.

I see.

> > > +	if (cpumask_intersects(cpumask, cpu_online_mask)) {
> > > +		mutex_lock(&wq_pool_mutex);
> > > +		ret = unbounds_cpumask_apply(cpumask);
> > > +		if (ret < 0) {
> > > +			/* Warn if rollback itself fails */
> > > +			WARN_ON_ONCE(unbounds_cpumask_apply(wq_unbound_cpumask));
> > 
> > Hmmm... but there's nothing which makes rolling back more likely to
> > succeed compared to the original applications.  It's gonna allocate
> > more pwqs.  Triggering WARN_ON_ONCE() seems weird.
> 
> Yeah but that's the least we can do. If we fail to even recover the old cpumask,
> the user should know about the half state fail.

I'm failing to see how it'd be better than just going through applying
the new mask if we're likely to end up with half-updated states
anyway.  What's the point of another layer of best effort logic which
is more likely to fail?

> > 1. Split apply_workqueue_attrs_locked() into two stages.  The first
> >    stage creates new pwqs as necessary and cache it.  Each @wq would
> >    need a pointer to remember these staged pwq_tbl.  The second stage
> >    commits them, which obviously can't fail.
> 
> That's already what it does. The problem is not that apply_workqueue_attrs()
> fails in he middle. It's rather when we succedded to apply the new cpumask
> to some workqueues, but failed with some. Then if we also fail on rollback,
> we end up with some workqueue affine to the new cpumask and some others
> affine to the old one.

No, what I mean is split apply_workqueue_attrs() itself into two
stages so that it can do all the allocations up-front and then do a
separate commit stage which can't fail.  That way, either all changes
are applied or none.

> So the only way to enforce either the new or the old affinity is to do
> what you suggest but for the whole list of workqueues. So we need to allocate
> first all pwqs and then apply all of them.

Yes yes. :)

> But it's going to imply fun with double linked list of struct pwq_allocation_object
> and stuff. Or maybe an array. This reminds be a bit generate_sched_domains(). It's
> not going to be _that_ simple nor pretty :)

Is it tho?  Don't we just need to keep a separate staging copy of
prepared pwq_tbl?  The commit stage can be pwq_tbl installation.
Looks like it shouldn't be too much of problem.  Am I missing
something?

> > 2. Proper error handling is hard.  Just do pr_warn() on each failure
> >    and continue to try to apply and always return 0.
> > 
> > If #1 isn't too complicated (would it be?), it'd be the better option;
> > otherwise, well, #2 should work most of the time, eh?
> 
> Yeah I think #2 should be way enough 99% of the time :)

Yeah, if #1 gets too hairy, #2 can be a reluctant option but if #1 is
doable without too much complication, I'd much prefer proper error
handling.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask
  2014-05-20 19:56       ` Tejun Heo
@ 2014-05-20 20:08         ` Frederic Weisbecker
  0 siblings, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2014-05-20 20:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Christoph Lameter, Kevin Hilman, Lai Jiangshan,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

On Tue, May 20, 2014 at 03:56:56PM -0400, Tejun Heo wrote:
> > > Hmmm... but there's nothing which makes rolling back more likely to
> > > succeed compared to the original applications.  It's gonna allocate
> > > more pwqs.  Triggering WARN_ON_ONCE() seems weird.
> > 
> > Yeah but that's the least we can do. If we fail to even recover the old cpumask,
> > the user should know about the half state fail.
> 
> I'm failing to see how it'd be better than just going through applying
> the new mask if we're likely to end up with half-updated states
> anyway.  What's the point of another layer of best effort logic which
> is more likely to fail?

If the error is -ENOMEM then yeah, but any other error wants rollback.

> > But it's going to imply fun with double linked list of struct pwq_allocation_object
> > and stuff. Or maybe an array. This reminds be a bit generate_sched_domains(). It's
> > not going to be _that_ simple nor pretty :)
> 
> Is it tho?  Don't we just need to keep a separate staging copy of
> prepared pwq_tbl?  The commit stage can be pwq_tbl installation.
> Looks like it shouldn't be too much of problem.  Am I missing
> something?

Sure, that still need an iteration array/list of pre-allocated objects.
Expect at least one more hundred lines.

> 
> > > 2. Proper error handling is hard.  Just do pr_warn() on each failure
> > >    and continue to try to apply and always return 0.
> > > 
> > > If #1 isn't too complicated (would it be?), it'd be the better option;
> > > otherwise, well, #2 should work most of the time, eh?
> > 
> > Yeah I think #2 should be way enough 99% of the time :)
> 
> Yeah, if #1 gets too hairy, #2 can be a reluctant option but if #1 is
> doable without too much complication, I'd much prefer proper error
> handling.

I can try yeah.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-16 16:16 ` [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues Frederic Weisbecker
  2014-05-16 20:12   ` Tejun Heo
@ 2014-05-21  7:29   ` Lai Jiangshan
  2014-05-21 19:18     ` Tejun Heo
  1 sibling, 1 reply; 32+ messages in thread
From: Lai Jiangshan @ 2014-05-21  7:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christoph Lameter, Kevin Hilman, Mike Galbraith,
	Paul E. McKenney, Tejun Heo, Viresh Kumar

On 05/17/2014 12:16 AM, Frederic Weisbecker wrote:

> @@ -3708,6 +3712,13 @@ static void rcu_free_pwq(struct rcu_head *rcu)
>  			container_of(rcu, struct pool_workqueue, rcu));
>  }
>  
> +static struct pool_workqueue *oldest_pwq(struct workqueue_struct *wq)
> +{
> +	return list_last_entry(&wq->pwqs, struct pool_workqueue, pwqs_node);
> +}
> +
> +static void pwq_adjust_max_active(struct pool_workqueue *pwq);


Hi, Tejun,

Should we reorder the pwq_adjust_max_active() to avoid this declare?
(Move pwq_adjust_max_active() to the place just before rcu_free_pwq())

Thanks,
Lai

> +
>  /*
>   * Scheduled on system_wq by put_pwq() when an unbound pwq hits zero refcnt
>   * and needs to be destroyed.
> @@ -3723,14 +3734,12 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
>  	if (WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)))
>  		return;
>  
> -	/*
> -	 * Unlink @pwq.  Synchronization against wq->mutex isn't strictly
> -	 * necessary on release but do it anyway.  It's easier to verify
> -	 * and consistent with the linking path.
> -	 */
>  	mutex_lock(&wq->mutex);
>  	list_del_rcu(&pwq->pwqs_node);
>  	is_last = list_empty(&wq->pwqs);
> +	/* try to activate the oldest pwq when needed */
> +	if (!is_last && (wq->flags & __WQ_ORDERED))
> +		pwq_adjust_max_active(oldest_pwq(wq));
>  	mutex_unlock(&wq->mutex);
>  
>  	mutex_lock(&wq_pool_mutex);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues
  2014-05-21  7:29   ` Lai Jiangshan
@ 2014-05-21 19:18     ` Tejun Heo
  0 siblings, 0 replies; 32+ messages in thread
From: Tejun Heo @ 2014-05-21 19:18 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Frederic Weisbecker, LKML, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Viresh Kumar

Hey,

On Wed, May 21, 2014 at 03:29:08PM +0800, Lai Jiangshan wrote:
> Should we reorder the pwq_adjust_max_active() to avoid this declare?
> (Move pwq_adjust_max_active() to the place just before rcu_free_pwq())

Yeah, sure.  Sounds good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3
  2014-05-16 16:16 [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2014-05-16 16:16 ` [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask Frederic Weisbecker
@ 2014-07-11  8:41 ` Lai Jiangshan
  2014-07-11 13:11   ` Frederic Weisbecker
  5 siblings, 1 reply; 32+ messages in thread
From: Lai Jiangshan @ 2014-07-11  8:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christoph Lameter, Kevin Hilman, Mike Galbraith,
	Paul E. McKenney, Tejun Heo, Viresh Kumar

Hi, Frederic

I'd like to take this work unless you are still working on it.
I would do some cleanup at first so that it will be much slow for me.

Thanks,
Lai


On 05/17/2014 12:16 AM, Frederic Weisbecker wrote:
> So in this version I actually save the cpumask belonging to wq (before
> it's intersected against the low level cpumask) in its unbounds attrs.
> 
> But the attrs passed to pwq and worker pools have the low level cpumask
> computed against the wq cpumask.
> 
> It makes it easier that way as the wq cpumask itself remains untouched.
> It's a user setting so it must stay intact. OTOH the cpumask of pwqs and
> worker pools only belong to the implementation so it's the right
> place to store the effective cpumask.
> 
> Also wq_update_unbound_numa() is fixed, and cpu_possible_mask() 
> is restored as a base. Thanks to Lai!
> 
> Thanks,
> 	Frederic
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
>         core/workqueue-v5
> ---
> 
> Frederic Weisbecker (4):
>       workqueue: Reorder sysfs code
>       workqueue: Create low-level unbound workqueues cpumask
>       workqueue: Split apply attrs code from its locking
>       workqueue: Allow modifying low level unbound workqueue cpumask
> 
> Lai Jiangshan (1):
>       workqueue: Allow changing attributions of ordered workqueues
> 
> 
>  kernel/workqueue.c | 1674 ++++++++++++++++++++++++++++------------------------
>  1 file changed, 900 insertions(+), 774 deletions(-)
> .
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3
  2014-07-11  8:41 ` [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Lai Jiangshan
@ 2014-07-11 13:11   ` Frederic Weisbecker
  0 siblings, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2014-07-11 13:11 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, Christoph Lameter, Kevin Hilman, Mike Galbraith,
	Paul E. McKenney, Tejun Heo, Viresh Kumar

On Fri, Jul 11, 2014 at 04:41:51PM +0800, Lai Jiangshan wrote:
> Hi, Frederic
> 
> I'd like to take this work unless you are still working on it.
> I would do some cleanup at first so that it will be much slow for me.

Hi Lai,

I was about to iterate again on that patchset but I would certainly be
happy if you take it given the amount of stuff in my TODO list.

I suggest you to take this branch:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	core/workqueue-v6

It's a big rebase on top of latest -rc1.

Also I suggest you to use "--patience" on git diff/log/format-patch etc...
because the changes contain big code move (like the sysfs part) that default
diff doesn't display well.

I can still help you on this patchset BTW if you need. I believe that
delayed workqueues weren't well handled with it, maybe I should
check further if they need some more treatment.

Thanks.

> 
> Thanks,
> Lai
> 
> 
> On 05/17/2014 12:16 AM, Frederic Weisbecker wrote:
> > So in this version I actually save the cpumask belonging to wq (before
> > it's intersected against the low level cpumask) in its unbounds attrs.
> > 
> > But the attrs passed to pwq and worker pools have the low level cpumask
> > computed against the wq cpumask.
> > 
> > It makes it easier that way as the wq cpumask itself remains untouched.
> > It's a user setting so it must stay intact. OTOH the cpumask of pwqs and
> > worker pools only belong to the implementation so it's the right
> > place to store the effective cpumask.
> > 
> > Also wq_update_unbound_numa() is fixed, and cpu_possible_mask() 
> > is restored as a base. Thanks to Lai!
> > 
> > Thanks,
> > 	Frederic
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> >         core/workqueue-v5
> > ---
> > 
> > Frederic Weisbecker (4):
> >       workqueue: Reorder sysfs code
> >       workqueue: Create low-level unbound workqueues cpumask
> >       workqueue: Split apply attrs code from its locking
> >       workqueue: Allow modifying low level unbound workqueue cpumask
> > 
> > Lai Jiangshan (1):
> >       workqueue: Allow changing attributions of ordered workqueues
> > 
> > 
> >  kernel/workqueue.c | 1674 ++++++++++++++++++++++++++++------------------------
> >  1 file changed, 900 insertions(+), 774 deletions(-)
> > .
> > 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-07-11 13:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-16 16:16 [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Frederic Weisbecker
2014-05-16 16:16 ` [PATCH 1/5] workqueue: Allow changing attributions of ordered workqueues Frederic Weisbecker
2014-05-16 20:12   ` Tejun Heo
2014-05-17 13:41     ` Frederic Weisbecker
2014-05-19 20:15       ` Tejun Heo
2014-05-20 14:32         ` Frederic Weisbecker
2014-05-20 14:35           ` Tejun Heo
2014-05-20 15:08             ` Frederic Weisbecker
2014-05-21  7:29   ` Lai Jiangshan
2014-05-21 19:18     ` Tejun Heo
2014-05-16 16:16 ` [PATCH 2/5] workqueue: Reorder sysfs code Frederic Weisbecker
2014-05-16 16:16 ` [PATCH 3/5] workqueue: Create low-level unbound workqueues cpumask Frederic Weisbecker
2014-05-16 17:52   ` Christoph Lameter
2014-05-16 18:35     ` Tejun Heo
2014-05-16 18:52       ` Christoph Lameter
2014-05-16 19:00         ` Tejun Heo
2014-05-16 19:22           ` Tejun Heo
2014-05-16 19:32           ` Christoph Lameter
2014-05-16 19:34             ` Tejun Heo
2014-05-16 19:45               ` Tejun Heo
2014-05-16 23:02                 ` Christoph Lameter
2014-05-16 23:48                   ` Tejun Heo
2014-05-17 22:45                     ` Christoph Lameter
2014-05-18  2:51                       ` Tejun Heo
2014-05-16 16:16 ` [PATCH 4/5] workqueue: Split apply attrs code from its locking Frederic Weisbecker
2014-05-16 16:16 ` [PATCH 5/5] workqueue: Allow modifying low level unbound workqueue cpumask Frederic Weisbecker
2014-05-16 20:50   ` Tejun Heo
2014-05-20 19:32     ` Frederic Weisbecker
2014-05-20 19:56       ` Tejun Heo
2014-05-20 20:08         ` Frederic Weisbecker
2014-07-11  8:41 ` [RFC PATCH 0/5] workqueue: Introduce low-level unbound wq sysfs cpumask v3 Lai Jiangshan
2014-07-11 13:11   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox