netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] IRQ affinity reverse-mapping
@ 2011-01-04 19:37 Ben Hutchings
  2011-01-04 19:38 ` [PATCH 1/2] genirq: Add IRQ affinity notifiers Ben Hutchings
  2011-01-04 19:39 ` [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
  0 siblings, 2 replies; 11+ messages in thread
From: Ben Hutchings @ 2011-01-04 19:37 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Tom Herbert, linux-kernel, netdev, linux-net-drivers

This patch series is intended to support queue selection on multiqueue
IRQ-per-queue network devices (accelerated RFS and XPS-MQ) and
potentially queue selection for other classes of multiqueue device.

The first patch implements IRQ affinity notifiers, based on the outline
that Thomas wrote in response to my earlier patch series for accelerated RFS.

The second patch is a generalisation of the CPU affinity reverse-
mapping, plus functions to maintain such a mapping based on the new IRQ
affinity notifiers.

I would like to be able to use this functionality in networking for
2.6.38.  Thomas, if you are happy with this, could these changes go
through net-next-2.6?  Alternately, if Linus pulls from linux-2.6-tip
and David pulls from Linus during the merge window, I can (re-)submit
the dependent changes after that.

Ben.

Ben Hutchings (2):
  genirq: Add IRQ affinity notifiers
  lib: cpu_rmap: CPU affinity reverse-mapping

 include/linux/cpu_rmap.h  |   73 +++++++++++++
 include/linux/interrupt.h |   41 +++++++
 include/linux/irqdesc.h   |    3 +
 kernel/irq/manage.c       |   81 ++++++++++++++
 lib/Kconfig               |    4 +
 lib/Makefile              |    2 +
 lib/cpu_rmap.c            |  262 +++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 466 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cpu_rmap.h
 create mode 100644 lib/cpu_rmap.c

-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] genirq: Add IRQ affinity notifiers
  2011-01-04 19:37 [PATCH 0/2] IRQ affinity reverse-mapping Ben Hutchings
@ 2011-01-04 19:38 ` Ben Hutchings
  2011-01-14 19:47   ` Thomas Gleixner
  2011-01-04 19:39 ` [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
  1 sibling, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2011-01-04 19:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
notification mechanism to support this.

This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/interrupt.h |   41 +++++++++++++++++++++++
 include/linux/irqdesc.h   |    3 ++
 kernel/irq/manage.c       |   81 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 125 insertions(+), 0 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 55e0d42..09d6039 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -14,6 +14,8 @@
 #include <linux/smp.h>
 #include <linux/percpu.h>
 #include <linux/hrtimer.h>
+#include <linux/kref.h>
+#include <linux/workqueue.h>
 
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
@@ -231,6 +233,28 @@ static inline void resume_device_irqs(void) { };
 static inline int check_wakeup_irqs(void) { return 0; }
 #endif
 
+/**
+ * struct irq_affinity_notify - context for notification of IRQ affinity changes
+ * @irq:		Interrupt to which notification applies
+ * @kref:		Reference count, for internal use
+ * @work:		Work item, for internal use
+ * @notify:		Function to be called on change.  This will be
+ *			called in process context.
+ * @release:		Function to be called on release.  This will be
+ *			called in process context.  Once registered, the
+ *			structure must only be freed when this function is
+ *			called or later.
+ */
+struct irq_affinity_notify {
+        unsigned int irq;
+        struct kref kref;
+#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
+        struct work_struct work;
+#endif
+        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
+        void (*release)(struct kref *ref);
+};
+
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
 
 extern cpumask_var_t irq_default_affinity;
@@ -240,6 +264,13 @@ extern int irq_can_set_affinity(unsigned int irq);
 extern int irq_select_affinity(unsigned int irq);
 
 extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+extern int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
+
+static inline void irq_run_affinity_notifiers(void)
+{
+	flush_scheduled_work();
+}
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
@@ -259,6 +290,16 @@ static inline int irq_set_affinity_hint(unsigned int irq,
 {
 	return -EINVAL;
 }
+
+static inline int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
+{
+	return 0;
+}
+
+static inline void irq_run_affinity_notifiers(void)
+{
+}
 #endif /* CONFIG_SMP && CONFIG_GENERIC_HARDIRQS */
 
 #ifdef CONFIG_GENERIC_HARDIRQS
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 979c68c..5e0d2e4 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -8,6 +8,7 @@
  * For now it's included from <linux/irq.h>
  */
 
+struct irq_affinity_notify;
 struct proc_dir_entry;
 struct timer_rand_state;
 /**
@@ -24,6 +25,7 @@ struct timer_rand_state;
  * @last_unhandled:	aging timer for unhandled count
  * @irqs_unhandled:	stats field for spurious unhandled interrupts
  * @lock:		locking for SMP
+ * @affinity_notify:	context for notification of affinity changes
  * @pending_mask:	pending rebalanced interrupts
  * @threads_active:	number of irqaction threads currently running
  * @wait_for_threads:	wait queue for sync_irq to wait for threaded handlers
@@ -70,6 +72,7 @@ struct irq_desc {
 	raw_spinlock_t		lock;
 #ifdef CONFIG_SMP
 	const struct cpumask	*affinity_hint;
+	struct irq_affinity_notify *affinity_notify;
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 	cpumask_var_t		pending_mask;
 #endif
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 91a5fa2..fb6525a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -134,6 +134,10 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 		irq_set_thread_affinity(desc);
 	}
 #endif
+	if (desc->affinity_notify) {
+		kref_get(&desc->affinity_notify->kref);
+		schedule_work(&desc->affinity_notify->work);
+	}
 	desc->status |= IRQ_AFFINITY_SET;
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return 0;
@@ -155,6 +159,79 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
+static void irq_affinity_notify(struct work_struct *work)
+{
+	struct irq_affinity_notify *notify =
+		container_of(work, struct irq_affinity_notify, work);
+	struct irq_desc *desc = irq_to_desc(notify->irq);
+	cpumask_var_t cpumask;
+	unsigned long flags;
+
+	if (!desc)
+		goto out;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
+		goto out;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+	if (desc->status & IRQ_MOVE_PENDING)
+		cpumask_copy(cpumask, desc->pending_mask);
+	else
+#endif
+		cpumask_copy(cpumask, desc->affinity);
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	notify->notify(notify, cpumask);
+
+	free_cpumask_var(cpumask);
+out:
+	kref_put(&notify->kref, notify->release);
+}
+
+/**
+ *	irq_set_affinity_notifier - control notification of IRQ affinity changes
+ *	@irq:		Interrupt for which to enable/disable notification
+ *	@notify:	Context for notification, or %NULL to disable
+ *			notification.  Function pointers must be initialised;
+ *			the other fields will be initialised by this function.
+ *
+ *	Must be called in process context.  Notification may only be enabled
+ *	after the IRQ is allocated but before it is bound with request_irq()
+ *	and must be disabled before the IRQ is freed using free_irq().
+ */
+int
+irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_affinity_notify *old_notify;
+	unsigned long flags;
+
+	/* The release function is promised process context */
+	might_sleep();
+
+	if (!desc)
+		return -EINVAL;
+
+	/* Complete initialisation of *notify */
+	if (notify) {
+		notify->irq = irq;
+		kref_init(&notify->kref);
+		INIT_WORK(&notify->work, irq_affinity_notify);
+	}
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+	old_notify = desc->affinity_notify;
+	desc->affinity_notify = notify;
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	if (old_notify)
+		kref_put(&old_notify->kref, old_notify->release);
+	
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_set_affinity_notifier);
+
 #ifndef CONFIG_AUTO_IRQ_AFFINITY
 /*
  * Generic version of the affinity autoselector.
@@ -1004,6 +1081,10 @@ void free_irq(unsigned int irq, void *dev_id)
 	if (!desc)
 		return;
 
+#ifdef CONFIG_SMP
+	BUG_ON(desc->affinity_notify);
+#endif
+
 	chip_bus_lock(desc);
 	kfree(__free_irq(irq, dev_id));
 	chip_bus_sync_unlock(desc);
-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-04 19:37 [PATCH 0/2] IRQ affinity reverse-mapping Ben Hutchings
  2011-01-04 19:38 ` [PATCH 1/2] genirq: Add IRQ affinity notifiers Ben Hutchings
@ 2011-01-04 19:39 ` Ben Hutchings
  2011-01-04 21:17   ` Eric Dumazet
  1 sibling, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2011-01-04 19:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
library functions to support a generic reverse-mapping from CPUs to
objects with affinity and the specific case where the objects are
IRQs.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/cpu_rmap.h |   73 +++++++++++++
 lib/Kconfig              |    4 +
 lib/Makefile             |    2 +
 lib/cpu_rmap.c           |  262 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 341 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cpu_rmap.h
 create mode 100644 lib/cpu_rmap.c

diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
new file mode 100644
index 0000000..6e2f5ff
--- /dev/null
+++ b/include/linux/cpu_rmap.h
@@ -0,0 +1,73 @@
+/*
+ * cpu_rmap.c: CPU affinity reverse-map support
+ * Copyright 2010 Solarflare Communications Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/gfp.h>
+#include <linux/slab.h>
+
+/**
+ * struct cpu_rmap - CPU affinity reverse-map
+ * @near: For each CPU, the index and distance to the nearest object,
+ *      based on affinity masks
+ * @size: Number of objects to be reverse-mapped
+ * @used: Number of objects added
+ * @obj: Array of object pointers
+ */
+struct cpu_rmap {
+	struct {
+		u16     index;
+		u16     dist;
+	} near[NR_CPUS];
+	u16		size, used;
+	void		*obj[0];
+};
+#define CPU_RMAP_DIST_INF 0xffff
+
+extern struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags);
+
+/**
+ * free_cpu_rmap - free CPU affinity reverse-map
+ * @rmap: Reverse-map allocated with alloc_cpu_rmap(), or %NULL
+ */
+static inline void free_cpu_rmap(struct cpu_rmap *rmap)
+{
+	kfree(rmap);
+}
+
+extern int cpu_rmap_add(struct cpu_rmap *rmap, void *obj);
+extern int cpu_rmap_update(struct cpu_rmap *rmap, u16 index,
+			   const struct cpumask *affinity);
+
+static inline u16 cpu_rmap_lookup_index(struct cpu_rmap *rmap, unsigned int cpu)
+{
+	return rmap->near[cpu].index;
+}
+
+static inline void *cpu_rmap_lookup_obj(struct cpu_rmap *rmap, unsigned int cpu)
+{
+	return rmap->obj[rmap->near[cpu].index];
+}
+
+#ifdef CONFIG_GENERIC_HARDIRQS
+
+/**
+ * alloc_irq_cpu_rmap - allocate CPU affinity reverse-map for IRQs
+ * @size: Number of objects to be mapped
+ *
+ * Must be called in process context.
+ */
+static inline struct cpu_rmap *alloc_irq_cpu_rmap(unsigned int size)
+{
+	return alloc_cpu_rmap(size, GFP_KERNEL);
+}
+extern void free_irq_cpu_rmap(struct cpu_rmap *rmap);
+
+extern int irq_cpu_rmap_add(struct cpu_rmap *rmap, int irq);
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 3d498b2..f43cb2e 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -195,6 +195,10 @@ config DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
        bool "Disable obsolete cpumask functions" if DEBUG_PER_CPU_MAPS
        depends on EXPERIMENTAL && BROKEN
 
+config CPU_RMAP
+	bool
+	depends on SMP
+
 #
 # Netlink attribute parsing support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index 0248767..001b528 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o
 
 obj-$(CONFIG_AVERAGE) += average.o
 
+obj-$(CONFIG_CPU_RMAP) += cpu_rmap.o
+
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
new file mode 100644
index 0000000..8f7f6c9
--- /dev/null
+++ b/lib/cpu_rmap.c
@@ -0,0 +1,262 @@
+/*
+ * cpu_rmap.c: CPU affinity reverse-map support
+ * Copyright 2010 Solarflare Communications Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#include <linux/cpu_rmap.h>
+#ifdef CONFIG_GENERIC_HARDIRQS
+#include <linux/interrupt.h>
+#endif
+#include <linux/module.h>
+
+/*
+ * These functions maintain a mapping from CPUs to some ordered set of
+ * objects with CPU affinities.  This can be seen as a reverse-map of
+ * CPU affinity.  However, we do not assume that the object affinities
+ * cover all CPUs in the system.  For those CPUs not directly covered
+ * by object affinities, we attempt to find a nearest object based on
+ * CPU topology.
+ */
+
+/**
+ * alloc_cpu_rmap - allocate CPU affinity reverse-map
+ * @size: Number of objects to be mapped
+ * @flags: Allocation flags e.g. %GFP_KERNEL
+ */
+struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags)
+{
+	struct cpu_rmap *rmap;
+	unsigned int cpu;
+
+	/* This is a silly number of objects, and we use u16 indices. */
+	if (size > 0xffff)
+		return NULL;
+
+	rmap = kzalloc(sizeof(*rmap) + size * sizeof(rmap->obj[0]), flags);
+	if (!rmap)
+		return NULL;
+
+	/* Initially assign CPUs to objects on a rota, since we have
+	 * no idea where the objects are.  Use infinite distance, so
+	 * any object with known distance is preferable.  Include the
+	 * CPUs that are not present/online, since we definitely want
+	 * any newly-hotplugged CPUs to have some object assigned.
+	 */
+	for_each_possible_cpu(cpu) {
+		rmap->near[cpu].index = cpu % size;
+		rmap->near[cpu].dist = CPU_RMAP_DIST_INF;
+	}
+
+	rmap->size = size;
+	return rmap;
+}
+EXPORT_SYMBOL(alloc_cpu_rmap);
+
+/* Reevaluate nearest object for given CPU, comparing with the given
+ * neighbours at the given distance.
+ */
+static bool cpu_rmap_copy_neigh(struct cpu_rmap *rmap, unsigned int cpu,
+				const struct cpumask *mask, u16 dist)
+{
+	int neigh;
+
+	for_each_cpu(neigh, mask) {
+		if (rmap->near[cpu].dist > dist &&
+		    rmap->near[neigh].dist <= dist) {
+			rmap->near[cpu].index = rmap->near[neigh].index;
+			rmap->near[cpu].dist = dist;
+			return true;
+		}
+	}
+	return false;
+}
+
+#ifdef DEBUG
+static void debug_print_rmap(const struct cpu_rmap *rmap, const char *prefix)
+{
+	unsigned index;
+	unsigned int cpu;
+
+	pr_info("cpu_rmap %p, %s:\n", rmap, prefix);
+
+	for_each_possible_cpu(cpu) {
+		index = rmap->near[cpu].index;
+		pr_info("cpu %d -> obj %u (distance %u)\n",
+			cpu, index, rmap->near[cpu].dist);
+	}
+}
+#else
+static inline void
+debug_print_rmap(const struct cpu_rmap *rmap, const char *prefix)
+{
+}
+#endif
+
+/**
+ * cpu_rmap_add - add object to a rmap
+ * @rmap: CPU rmap allocated with alloc_cpu_rmap()
+ * @obj: Object to add to rmap
+ *
+ * Return index of object.
+ */
+int cpu_rmap_add(struct cpu_rmap *rmap, void *obj)
+{
+	u16 index;
+
+	BUG_ON(rmap->used >= rmap->size);
+	index = rmap->used++;
+	rmap->obj[index] = obj;
+	return index;
+}
+EXPORT_SYMBOL(cpu_rmap_add);
+
+/**
+ * cpu_rmap_update - update CPU rmap following a change of object affinity
+ * @rmap: CPU rmap to update
+ * @index: Index of object whose affinity changed
+ * @affinity: New CPU affinity of object
+ */
+int cpu_rmap_update(struct cpu_rmap *rmap, u16 index,
+		    const struct cpumask *affinity)
+{
+	cpumask_var_t update_mask;
+	unsigned int cpu;
+
+	if (unlikely(!zalloc_cpumask_var(&update_mask, GFP_KERNEL)))
+		return -ENOMEM;
+
+	/* Invalidate distance for all CPUs for which this used to be
+	 * the nearest object.  Mark those CPUs for update.
+	 */
+	for_each_online_cpu(cpu) {
+		if (rmap->near[cpu].index == index) {
+			rmap->near[cpu].dist = CPU_RMAP_DIST_INF;
+			cpumask_set_cpu(cpu, update_mask);
+		}
+	}
+
+	debug_print_rmap(rmap, "after invalidating old distances");
+
+	/* Set distance to 0 for all CPUs in the new affinity mask.
+	 * Mark all CPUs within their NUMA nodes for update.
+	 */
+	for_each_cpu(cpu, affinity) {
+		rmap->near[cpu].index = index;
+		rmap->near[cpu].dist = 0;
+		cpumask_or(update_mask, update_mask,
+			   cpumask_of_node(cpu_to_node(cpu)));
+	}
+
+	debug_print_rmap(rmap, "after updating neighbours");
+
+	/* Update distances based on topology */
+	for_each_cpu(cpu, update_mask) {
+		if (cpu_rmap_copy_neigh(rmap, cpu,
+					topology_thread_cpumask(cpu), 1))
+			continue;
+		if (cpu_rmap_copy_neigh(rmap, cpu,
+					topology_core_cpumask(cpu), 2))
+			continue;
+		if (cpu_rmap_copy_neigh(rmap, cpu,
+					cpumask_of_node(cpu_to_node(cpu)), 3))
+			continue;
+		/* We could continue into NUMA node distances, but for now
+		 * we give up.
+		 */
+	}
+
+	debug_print_rmap(rmap, "after copying neighbours");
+
+	free_cpumask_var(update_mask);
+	return 0;
+}
+EXPORT_SYMBOL(cpu_rmap_update);
+
+#ifdef CONFIG_GENERIC_HARDIRQS
+
+/* Glue between IRQ affinity notifiers and CPU rmaps */
+
+struct irq_glue {
+	struct irq_affinity_notify notify;
+	struct cpu_rmap *rmap;
+	u16 index;
+};
+
+/**
+ * free_irq_cpu_rmap - free a CPU affinity reverse-map used for IRQs
+ * @rmap: Reverse-map allocated with alloc_irq_cpu_map(), or %NULL
+ *
+ * Must be called in process context, before freeing the IRQs, and
+ * without holding any locks required by global workqueue items.
+ */
+void free_irq_cpu_rmap(struct cpu_rmap *rmap)
+{
+	struct irq_glue *glue;
+	u16 index;
+
+	if (!rmap)
+		return;
+
+	for (index = 0; index < rmap->used; index++) {
+		glue = rmap->obj[index];
+		irq_set_affinity_notifier(glue->notify.irq, NULL);
+	}
+	irq_run_affinity_notifiers();
+
+	kfree(rmap);
+}
+EXPORT_SYMBOL(free_irq_cpu_rmap);
+
+static void
+irq_cpu_rmap_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
+{
+	struct irq_glue *glue =
+		container_of(notify, struct irq_glue, notify);
+	int rc;
+
+	rc = cpu_rmap_update(glue->rmap, glue->index, mask);
+	if (rc)
+		pr_warning("irq_cpu_rmap_notify: update failed: %d\n", rc);
+}
+
+static void irq_cpu_rmap_release(struct kref *ref)
+{
+	struct irq_glue *glue =
+		container_of(ref, struct irq_glue, notify.kref);
+	kfree(glue);
+}
+
+/**
+ * irq_cpu_rmap_add - add an IRQ to a CPU affinity reverse-map
+ * @rmap: The reverse-map
+ * @irq: The IRQ number
+ *
+ * This adds an IRQ affinity notifier that will update the reverse-map
+ * automatically.
+ *
+ * Must be called in process context, after the IRQ is allocated but
+ * before it is bound with request_irq().
+ */
+int irq_cpu_rmap_add(struct cpu_rmap *rmap, int irq)
+{
+	struct irq_glue *glue = kzalloc(sizeof(*glue), GFP_KERNEL);
+	int rc;
+
+	if (!glue)
+		return -ENOMEM;
+	glue->notify.notify = irq_cpu_rmap_notify;
+	glue->notify.release = irq_cpu_rmap_release;
+	glue->rmap = rmap;
+	glue->index = cpu_rmap_add(rmap, glue);
+	rc = irq_set_affinity_notifier(irq, &glue->notify);
+	if (rc)
+		kfree(glue);
+	return rc;
+}
+EXPORT_SYMBOL(irq_cpu_rmap_add);
+
+#endif /* CONFIG_GENERIC_HARDIRQS */
-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-04 19:39 ` [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
@ 2011-01-04 21:17   ` Eric Dumazet
  2011-01-04 21:23     ` Ben Hutchings
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-01-04 21:17 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Thomas Gleixner, David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

Le mardi 04 janvier 2011 à 19:39 +0000, Ben Hutchings a écrit :
> When initiating I/O on a multiqueue and multi-IRQ device, we may want
> to select a queue for which the response will be handled on the same
> or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
> library functions to support a generic reverse-mapping from CPUs to
> objects with affinity and the specific case where the objects are
> IRQs.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
>  include/linux/cpu_rmap.h |   73 +++++++++++++
>  lib/Kconfig              |    4 +
>  lib/Makefile             |    2 +
>  lib/cpu_rmap.c           |  262 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 341 insertions(+), 0 deletions(-)
>  create mode 100644 include/linux/cpu_rmap.h
>  create mode 100644 lib/cpu_rmap.c
> 
> diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
> new file mode 100644
> index 0000000..6e2f5ff
> --- /dev/null
> +++ b/include/linux/cpu_rmap.h
> @@ -0,0 +1,73 @@
> +/*
> + * cpu_rmap.c: CPU affinity reverse-map support
> + * Copyright 2010 Solarflare Communications Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation, incorporated herein by reference.
> + */
> +
> +#include <linux/cpumask.h>
> +#include <linux/gfp.h>
> +#include <linux/slab.h>
> +
> +/**
> + * struct cpu_rmap - CPU affinity reverse-map
> + * @near: For each CPU, the index and distance to the nearest object,
> + *      based on affinity masks
> + * @size: Number of objects to be reverse-mapped
> + * @used: Number of objects added
> + * @obj: Array of object pointers
> + */
> +struct cpu_rmap {
> +	struct {
> +		u16     index;
> +		u16     dist;
> +	} near[NR_CPUS];

This [NR_CPUS] is highly suspect.

Are you sure you cant use a per_cpu allocation here ?

> +	u16		size, used;
> +	void		*obj[0];
> +};
> +#define CPU_RMAP_DIST_INF 0xffff
> +


> +
> +/**
> + * alloc_cpu_rmap - allocate CPU affinity reverse-map
> + * @size: Number of objects to be mapped
> + * @flags: Allocation flags e.g. %GFP_KERNEL
> + */

I really doubt you need other than GFP_KERNEL. (Especially if you switch
to per_cpu alloc ;) )

> +struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags)
> +{
> +	struct cpu_rmap *rmap;
> +	unsigned int cpu;
> +
> +	/* This is a silly number of objects, and we use u16 indices. */
> +	if (size > 0xffff)
> +		return NULL;
> +
> +	rmap = kzalloc(sizeof(*rmap) + size * sizeof(rmap->obj[0]), flags);
> +	if (!rmap)
> +		return NULL;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-04 21:17   ` Eric Dumazet
@ 2011-01-04 21:23     ` Ben Hutchings
  2011-01-04 21:45       ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2011-01-04 21:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

On Tue, 2011-01-04 at 22:17 +0100, Eric Dumazet wrote:
> Le mardi 04 janvier 2011 à 19:39 +0000, Ben Hutchings a écrit :
> > When initiating I/O on a multiqueue and multi-IRQ device, we may want
> > to select a queue for which the response will be handled on the same
> > or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
> > library functions to support a generic reverse-mapping from CPUs to
> > objects with affinity and the specific case where the objects are
> > IRQs.
[...]
> > +/**
> > + * struct cpu_rmap - CPU affinity reverse-map
> > + * @near: For each CPU, the index and distance to the nearest object,
> > + *      based on affinity masks
> > + * @size: Number of objects to be reverse-mapped
> > + * @used: Number of objects added
> > + * @obj: Array of object pointers
> > + */
> > +struct cpu_rmap {
> > +	struct {
> > +		u16     index;
> > +		u16     dist;
> > +	} near[NR_CPUS];
> 
> This [NR_CPUS] is highly suspect.
> 
> Are you sure you cant use a per_cpu allocation here ?

I think that would be a waste of space in shared caches, as this is
read-mostly.

> > +	u16		size, used;
> > +	void		*obj[0];
> > +};
> > +#define CPU_RMAP_DIST_INF 0xffff
> > +
> 
> 
> > +
> > +/**
> > + * alloc_cpu_rmap - allocate CPU affinity reverse-map
> > + * @size: Number of objects to be mapped
> > + * @flags: Allocation flags e.g. %GFP_KERNEL
> > + */
> 
> I really doubt you need other than GFP_KERNEL. (Especially if you switch
> to per_cpu alloc ;) )
[...]

I agree, but this is consistent with ~all other allocation functions.

Ben.


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-04 21:23     ` Ben Hutchings
@ 2011-01-04 21:45       ` Eric Dumazet
  2011-01-04 22:04         ` Ben Hutchings
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2011-01-04 21:45 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Thomas Gleixner, David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

Le mardi 04 janvier 2011 à 21:23 +0000, Ben Hutchings a écrit :
> On Tue, 2011-01-04 at 22:17 +0100, Eric Dumazet wrote:
> > Le mardi 04 janvier 2011 à 19:39 +0000, Ben Hutchings a écrit :
> > > When initiating I/O on a multiqueue and multi-IRQ device, we may want
> > > to select a queue for which the response will be handled on the same
> > > or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
> > > library functions to support a generic reverse-mapping from CPUs to
> > > objects with affinity and the specific case where the objects are
> > > IRQs.
> [...]
> > > +/**
> > > + * struct cpu_rmap - CPU affinity reverse-map
> > > + * @near: For each CPU, the index and distance to the nearest object,
> > > + *      based on affinity masks
> > > + * @size: Number of objects to be reverse-mapped
> > > + * @used: Number of objects added
> > > + * @obj: Array of object pointers
> > > + */
> > > +struct cpu_rmap {
> > > +	struct {
> > > +		u16     index;
> > > +		u16     dist;
> > > +	} near[NR_CPUS];
> > 
> > This [NR_CPUS] is highly suspect.
> > 
> > Are you sure you cant use a per_cpu allocation here ?
> 
> I think that would be a waste of space in shared caches, as this is
> read-mostly.

This is slow path, unless I dont understood the intent.

Cache lines dont matter. I was not concerned about speed but memory
needs.

NR_CPUS can be 4096 on some distros, that means a 32Kbyte allocation.

Really, you'll have to have very strong arguments to introduce an
[NR_CPUS] array in the kernel today.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-04 21:45       ` Eric Dumazet
@ 2011-01-04 22:04         ` Ben Hutchings
  2011-01-04 22:19           ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2011-01-04 22:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

On Tue, 2011-01-04 at 22:45 +0100, Eric Dumazet wrote:
> Le mardi 04 janvier 2011 à 21:23 +0000, Ben Hutchings a écrit :
> > On Tue, 2011-01-04 at 22:17 +0100, Eric Dumazet wrote:
> > > Le mardi 04 janvier 2011 à 19:39 +0000, Ben Hutchings a écrit :
> > > > When initiating I/O on a multiqueue and multi-IRQ device, we may want
> > > > to select a queue for which the response will be handled on the same
> > > > or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
> > > > library functions to support a generic reverse-mapping from CPUs to
> > > > objects with affinity and the specific case where the objects are
> > > > IRQs.
> > [...]
> > > > +/**
> > > > + * struct cpu_rmap - CPU affinity reverse-map
> > > > + * @near: For each CPU, the index and distance to the nearest object,
> > > > + *      based on affinity masks
> > > > + * @size: Number of objects to be reverse-mapped
> > > > + * @used: Number of objects added
> > > > + * @obj: Array of object pointers
> > > > + */
> > > > +struct cpu_rmap {
> > > > +	struct {
> > > > +		u16     index;
> > > > +		u16     dist;
> > > > +	} near[NR_CPUS];
> > > 
> > > This [NR_CPUS] is highly suspect.
> > > 
> > > Are you sure you cant use a per_cpu allocation here ?
> > 
> > I think that would be a waste of space in shared caches, as this is
> > read-mostly.
> 
> This is slow path, unless I dont understood the intent.

get_rps_cpu() will need to read from an arbitrary entry in cpu_rmap (not
the current CPU's entry) for each new flow and for each flow that went
idle for a while.  That's not fast path but it is part of the data path,
not the control path.

> Cache lines dont matter. I was not concerned about speed but memory
> needs.
> 
> NR_CPUS can be 4096 on some distros, that means a 32Kbyte allocation.
> 
> Really, you'll have to have very strong arguments to introduce an
> [NR_CPUS] array in the kernel today.

I could replace this with a pointer to an array of size
num_possible_cpus().  But I think per_cpu is wrong here.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping
  2011-01-04 22:04         ` Ben Hutchings
@ 2011-01-04 22:19           ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2011-01-04 22:19 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Thomas Gleixner, David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

Le mardi 04 janvier 2011 à 22:04 +0000, Ben Hutchings a écrit :

> get_rps_cpu() will need to read from an arbitrary entry in cpu_rmap (not
> the current CPU's entry) for each new flow and for each flow that went
> idle for a while.  That's not fast path but it is part of the data path,
> not the control path.
> 

Hmm, I call this fast path :(

> > Cache lines dont matter. I was not concerned about speed but memory
> > needs.
> > 
> > NR_CPUS can be 4096 on some distros, that means a 32Kbyte allocation.
> > 
> > Really, you'll have to have very strong arguments to introduce an
> > [NR_CPUS] array in the kernel today.
> 
> I could replace this with a pointer to an array of size
> num_possible_cpus().  But I think per_cpu is wrong here.

Yes, an dynamic array is acceptable

You probably mean nr_cpu_ids 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] genirq: Add IRQ affinity notifiers
  2011-01-04 19:38 ` [PATCH 1/2] genirq: Add IRQ affinity notifiers Ben Hutchings
@ 2011-01-14 19:47   ` Thomas Gleixner
  2011-01-14 20:06     ` Ben Hutchings
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2011-01-14 19:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

On Tue, 4 Jan 2011, Ben Hutchings wrote:
> +/**
> + * struct irq_affinity_notify - context for notification of IRQ affinity changes
> + * @irq:		Interrupt to which notification applies
> + * @kref:		Reference count, for internal use
> + * @work:		Work item, for internal use
> + * @notify:		Function to be called on change.  This will be
> + *			called in process context.
> + * @release:		Function to be called on release.  This will be
> + *			called in process context.  Once registered, the
> + *			structure must only be freed when this function is
> + *			called or later.
> + */
> +struct irq_affinity_notify {
> +        unsigned int irq;
> +        struct kref kref;
> +#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)

The whole affinity thing is SMP and GENERIC_HARDIRQS only anyway, so
what's the point of this ifdeffery ?

> +        struct work_struct work;
> +#endif
> +        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
> +        void (*release)(struct kref *ref);
> +};
> +

> +/**
> + *	irq_set_affinity_notifier - control notification of IRQ affinity changes
> + *	@irq:		Interrupt for which to enable/disable notification
> + *	@notify:	Context for notification, or %NULL to disable
> + *			notification.  Function pointers must be initialised;
> + *			the other fields will be initialised by this function.
> + *
> + *	Must be called in process context.  Notification may only be enabled
> + *	after the IRQ is allocated but before it is bound with request_irq()

Why? And if there is that restriction, then it needs to be
checked. But I don't see why this is necessary.

> + *	and must be disabled before the IRQ is freed using free_irq().
> + */

> +#ifdef CONFIG_SMP
> +	BUG_ON(desc->affinity_notify);

We should be nice here and just WARN and fixup the wreckage by
uninstalling it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] genirq: Add IRQ affinity notifiers
  2011-01-14 19:47   ` Thomas Gleixner
@ 2011-01-14 20:06     ` Ben Hutchings
  2011-01-14 20:40       ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2011-01-14 20:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

On Fri, 2011-01-14 at 20:47 +0100, Thomas Gleixner wrote:
> On Tue, 4 Jan 2011, Ben Hutchings wrote:
> > +/**
> > + * struct irq_affinity_notify - context for notification of IRQ affinity changes
> > + * @irq:		Interrupt to which notification applies
> > + * @kref:		Reference count, for internal use
> > + * @work:		Work item, for internal use
> > + * @notify:		Function to be called on change.  This will be
> > + *			called in process context.
> > + * @release:		Function to be called on release.  This will be
> > + *			called in process context.  Once registered, the
> > + *			structure must only be freed when this function is
> > + *			called or later.
> > + */
> > +struct irq_affinity_notify {
> > +        unsigned int irq;
> > +        struct kref kref;
> > +#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
> 
> The whole affinity thing is SMP and GENERIC_HARDIRQS only anyway, so
> what's the point of this ifdeffery ?

The intent is that code using this can be compiled even if those config
options are not set.  The work_struct is not needed in that case.  I
think this is probably pointless though.

> > +        struct work_struct work;
> > +#endif
> > +        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
> > +        void (*release)(struct kref *ref);
> > +};
> > +
> 
> > +/**
> > + *	irq_set_affinity_notifier - control notification of IRQ affinity changes
> > + *	@irq:		Interrupt for which to enable/disable notification
> > + *	@notify:	Context for notification, or %NULL to disable
> > + *			notification.  Function pointers must be initialised;
> > + *			the other fields will be initialised by this function.
> > + *
> > + *	Must be called in process context.  Notification may only be enabled
> > + *	after the IRQ is allocated but before it is bound with request_irq()
> 
> Why? And if there is that restriction, then it needs to be
> checked. But I don't see why this is necessary.

Which restriction?

> > + *	and must be disabled before the IRQ is freed using free_irq().
> > + */
> 
> > +#ifdef CONFIG_SMP
> > +	BUG_ON(desc->affinity_notify);
> 
> We should be nice here and just WARN and fixup the wreckage by
> uninstalling it.

OK.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] genirq: Add IRQ affinity notifiers
  2011-01-14 20:06     ` Ben Hutchings
@ 2011-01-14 20:40       ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2011-01-14 20:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Tom Herbert, linux-kernel, netdev,
	linux-net-drivers

On Fri, 14 Jan 2011, Ben Hutchings wrote:
> On Fri, 2011-01-14 at 20:47 +0100, Thomas Gleixner wrote:
> > > +#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
> > 
> > The whole affinity thing is SMP and GENERIC_HARDIRQS only anyway, so
> > what's the point of this ifdeffery ?
> 
> The intent is that code using this can be compiled even if those config
> options are not set.  The work_struct is not needed in that case.  I
> think this is probably pointless though.

Yup, work_struct is defined for the !SMP and !GENERIC_HARDIRQS case as
well :)
 
> > > +        struct work_struct work;
> > > +#endif
> > > +        void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
> > > +        void (*release)(struct kref *ref);
> > > +};
> > > +
> > 
> > > +/**
> > > + *	irq_set_affinity_notifier - control notification of IRQ affinity changes
> > > + *	@irq:		Interrupt for which to enable/disable notification
> > > + *	@notify:	Context for notification, or %NULL to disable
> > > + *			notification.  Function pointers must be initialised;
> > > + *			the other fields will be initialised by this function.
> > > + *
> > > + *	Must be called in process context.  Notification may only be enabled
> > > + *	after the IRQ is allocated but before it is bound with request_irq()
> > 
> > Why? And if there is that restriction, then it needs to be
> > checked. But I don't see why this is necessary.
> 
> Which restriction?

  Notification may only be enabled after the IRQ is allocated but
  before it is bound with request_irq()

After IRQ is allocated is obvious, but why needs it to be done
_before_ request_irq() ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-01-14 20:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04 19:37 [PATCH 0/2] IRQ affinity reverse-mapping Ben Hutchings
2011-01-04 19:38 ` [PATCH 1/2] genirq: Add IRQ affinity notifiers Ben Hutchings
2011-01-14 19:47   ` Thomas Gleixner
2011-01-14 20:06     ` Ben Hutchings
2011-01-14 20:40       ` Thomas Gleixner
2011-01-04 19:39 ` [PATCH 2/2] lib: cpu_rmap: CPU affinity reverse-mapping Ben Hutchings
2011-01-04 21:17   ` Eric Dumazet
2011-01-04 21:23     ` Ben Hutchings
2011-01-04 21:45       ` Eric Dumazet
2011-01-04 22:04         ` Ben Hutchings
2011-01-04 22:19           ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).