* [PATCH] powerpc: Improve IRQ radix tree locking
@ 2006-08-25 6:04 Benjamin Herrenschmidt
0 siblings, 0 replies; 3+ messages in thread
From: Benjamin Herrenschmidt @ 2006-08-25 6:04 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev list
When reworking the powerpc irq code, I figured out that we were using
the radix tree in a racy way. As a temporary fix, I put a spinlock in
there. However, this can have a significant impact on performances. This
patch reworks that to use a smarter technique based on the fact that
what we need is in fact a rwlock with extremely rare writers (thus
optimized for the read path).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Index: linux-work/arch/powerpc/kernel/irq.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/irq.c 2006-08-25 14:08:41.000000000 +1000
+++ linux-work/arch/powerpc/kernel/irq.c 2006-08-25 15:56:17.000000000 +1000
@@ -322,7 +322,8 @@ EXPORT_SYMBOL(do_softirq);
static LIST_HEAD(irq_hosts);
static spinlock_t irq_big_lock = SPIN_LOCK_UNLOCKED;
-
+static DEFINE_PER_CPU(unsigned int, irq_radix_reader);
+static unsigned int irq_radix_writer;
struct irq_map_entry irq_map[NR_IRQS];
static unsigned int irq_virq_count = NR_IRQS;
static struct irq_host *irq_default_host;
@@ -455,6 +456,60 @@ void irq_set_virq_count(unsigned int cou
irq_virq_count = count;
}
+/* radix tree not lockless safe ! we use a brlock-type mecanism
+ * for now, until we can use a lockless radix tree
+ */
+static void irq_radix_wrlock(unsigned long *flags)
+{
+ unsigned int cpu, ok;
+
+ spin_lock_irqsave(&irq_big_lock, *flags);
+ irq_radix_writer = 1;
+ do {
+ ok = 1;
+ smp_mb();
+ for_each_possible_cpu(cpu) {
+ if (per_cpu(irq_radix_reader, cpu)) {
+ ok = 0;
+ break;
+ }
+ }
+ if (!ok)
+ cpu_relax();
+ } while(!ok);
+}
+
+static void irq_radix_wrunlock(unsigned long flags)
+{
+ smp_wmb();
+ irq_radix_writer = 0;
+ spin_unlock_irqrestore(&irq_big_lock, flags);
+}
+
+static void irq_radix_rdlock(unsigned long *flags)
+{
+ local_irq_save(*flags);
+ if (irq_radix_writer)
+ goto slow;
+ __get_cpu_var(irq_radix_reader) = 1;
+ smp_mb();
+ if (irq_radix_writer == 0)
+ return;
+ __get_cpu_var(irq_radix_reader) = 0;
+ smp_wmb();
+ slow:
+ spin_lock(&irq_big_lock);
+ __get_cpu_var(irq_radix_reader) = 1;
+ spin_unlock(&irq_big_lock);
+}
+
+static void irq_radix_rdunlock(unsigned long flags)
+{
+ __get_cpu_var(irq_radix_reader) = 0;
+ local_irq_restore(flags);
+}
+
+
unsigned int irq_create_mapping(struct irq_host *host,
irq_hw_number_t hwirq)
{
@@ -604,13 +659,9 @@ void irq_dispose_mapping(unsigned int vi
/* Check if radix tree allocated yet */
if (host->revmap_data.tree.gfp_mask == 0)
break;
- /* XXX radix tree not safe ! remove lock whem it becomes safe
- * and use some RCU sync to make sure everything is ok before we
- * can re-use that map entry
- */
- spin_lock_irqsave(&irq_big_lock, flags);
+ irq_radix_wrlock(&flags);
radix_tree_delete(&host->revmap_data.tree, hwirq);
- spin_unlock_irqrestore(&irq_big_lock, flags);
+ irq_radix_wrunlock(flags);
break;
}
@@ -677,25 +728,24 @@ unsigned int irq_radix_revmap(struct irq
if (tree->gfp_mask == 0)
return irq_find_mapping(host, hwirq);
- /* XXX Current radix trees are NOT SMP safe !!! Remove that lock
- * when that is fixed (when Nick's patch gets in
- */
- spin_lock_irqsave(&irq_big_lock, flags);
-
/* Now try to resolve */
+ irq_radix_rdlock(&flags);
ptr = radix_tree_lookup(tree, hwirq);
+ irq_radix_rdunlock(flags);
+
/* Found it, return */
if (ptr) {
virq = ptr - irq_map;
- goto bail;
+ return virq;
}
/* If not there, try to insert it */
virq = irq_find_mapping(host, hwirq);
- if (virq != NO_IRQ)
+ if (virq != NO_IRQ) {
+ irq_radix_wrlock(&flags);
radix_tree_insert(tree, hwirq, &irq_map[virq]);
- bail:
- spin_unlock_irqrestore(&irq_big_lock, flags);
+ irq_radix_wrunlock(flags);
+ }
return virq;
}
@@ -806,12 +856,12 @@ static int irq_late_init(void)
struct irq_host *h;
unsigned long flags;
- spin_lock_irqsave(&irq_big_lock, flags);
+ irq_radix_wrlock(&flags);
list_for_each_entry(h, &irq_hosts, link) {
if (h->revmap_type == IRQ_HOST_MAP_TREE)
INIT_RADIX_TREE(&h->revmap_data.tree, GFP_ATOMIC);
}
- spin_unlock_irqrestore(&irq_big_lock, flags);
+ irq_radix_wrunlock(flags);
return 0;
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc: Improve IRQ radix tree locking
@ 2006-08-25 14:42 Milton Miller
2006-08-25 22:54 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 3+ messages in thread
From: Milton Miller @ 2006-08-25 14:42 UTC (permalink / raw)
To: linuxppc-dev, Benjamin Herrenschmidt; +Cc: Paul Mackerras, Anton Blanchard
On Fri Aug 25 16:04:52 EST 2006, Benjamin Herrenschmidt wrote:
> When reworking the powerpc irq code, I figured out that we were using
> the radix tree in a racy way. As a temporary fix, I put a spinlock in
> there. However, this can have a significant impact on performances.
> This
> patch reworks that to use a smarter technique based on the fact that
> what we need is in fact a rwlock with extremely rare writers (thus
> optimized for the read path).
> +static void irq_radix_wrlock(unsigned long *flags)
> +{
> + unsigned int cpu, ok;
> +
> + spin_lock_irqsave(&irq_big_lock, *flags);
> + irq_radix_writer = 1;
I think the smp_mb() is needed after here
> + do {
> + ok = 1;
> + smp_mb();
but not every time we poll the cpus. We are only
updating our local variable ok which we have not
given anyone else. It could even be a register.
> + for_each_possible_cpu(cpu) {
> + if (per_cpu(irq_radix_reader, cpu)) {
> + ok = 0;
> + break;
> + }
> + }
> + if (!ok)
> + cpu_relax();
Hmmm. the gcc barrier is conditional. How about putting
barrier() before the ok=1 at the top, to avoid any optimization
blockage but still telling gcc you must read every time?
> + } while(!ok);
> +}
> +
Oh, and how about some (un)likely in irq_radix_rdlock ? It
could happen by default by why not show our expected path?
And, while I'm thinking of changes, when set ok=0 above add:
#ifdef CONFIG_SPINLOCK_DEBUG
BUG_ON(cpu == smp_processor_id());
#endif
Hmm... just thought of something .... we are spinning under
irq lock. We are waiting on some other thread, but not
telling the hypervisor. Should we confer our execution
over to the cpu we are waiting on? I know, it sounds ugly
to call that code. I guess we would want a
cpu_waiting_on(cpu) that calls cpu_relax and/or hypervisor.
milton
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc: Improve IRQ radix tree locking
2006-08-25 14:42 Milton Miller
@ 2006-08-25 22:54 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 3+ messages in thread
From: Benjamin Herrenschmidt @ 2006-08-25 22:54 UTC (permalink / raw)
To: Milton Miller; +Cc: linuxppc-dev, Paul Mackerras, Anton Blanchard
On Fri, 2006-08-25 at 09:42 -0500, Milton Miller wrote:
> On Fri Aug 25 16:04:52 EST 2006, Benjamin Herrenschmidt wrote:
> > When reworking the powerpc irq code, I figured out that we were using
> > the radix tree in a racy way. As a temporary fix, I put a spinlock in
> > there. However, this can have a significant impact on performances.
> > This
> > patch reworks that to use a smarter technique based on the fact that
> > what we need is in fact a rwlock with extremely rare writers (thus
> > optimized for the read path).
>
>
> > +static void irq_radix_wrlock(unsigned long *flags)
> > +{
> > + unsigned int cpu, ok;
> > +
> > + spin_lock_irqsave(&irq_big_lock, *flags);
> > + irq_radix_writer = 1;
>
> I think the smp_mb() is needed after here
>
> > + do {
> > + ok = 1;
> > + smp_mb();
>
> but not every time we poll the cpus. We are only
> updating our local variable ok which we have not
> given anyone else. It could even be a register.
Doesn't really matter, the write case is extremely rare and not at all
performance critical. If I don't put the mb in the loop, I wonder if the
compiler might start optimizing out the reads... and I don't want to add
a volatile in there. I feel safer with the smp_mb() in there.
> > + for_each_possible_cpu(cpu) {
> > + if (per_cpu(irq_radix_reader, cpu)) {
> > + ok = 0;
> > + break;
> > + }
> > + }
> > + if (!ok)
> > + cpu_relax();
>
> Hmmm. the gcc barrier is conditional. How about putting
> barrier() before the ok=1 at the top, to avoid any optimization
> blockage but still telling gcc you must read every time?
Yeah, well, as I said, smp_mb() hits both cases and this isn't perf.
critical...
> > + } while(!ok);
> > +}
> > +
>
> Oh, and how about some (un)likely in irq_radix_rdlock ? It
> could happen by default by why not show our expected path?
Yup, that would make sense. I might also drop the first test of
irq_radix_writer. The will penalize a bit more the contention case vs.
the read case but that's exactly what I want...
> And, while I'm thinking of changes, when set ok=0 above add:
> #ifdef CONFIG_SPINLOCK_DEBUG
> BUG_ON(cpu == smp_processor_id());
> #endif
Do you see any case where I might actually hit that ? :)
> Hmm... just thought of something .... we are spinning under
> irq lock. We are waiting on some other thread, but not
> telling the hypervisor. Should we confer our execution
> over to the cpu we are waiting on? I know, it sounds ugly
> to call that code. I guess we would want a
> cpu_waiting_on(cpu) that calls cpu_relax and/or hypervisor.
I'm not sure this will really matter in real life. The write case is
really so rare and the contention case even more...
Ben.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-08-25 22:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-25 6:04 [PATCH] powerpc: Improve IRQ radix tree locking Benjamin Herrenschmidt
-- strict thread matches above, loose matches on Subject: below --
2006-08-25 14:42 Milton Miller
2006-08-25 22:54 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).