* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-23 23:49 ` Suresh Siddha
@ 2012-05-24 1:40 ` Dimitri Sivanich
2012-05-24 14:37 ` Dimitri Sivanich
2012-05-24 14:53 ` Thomas Gleixner
2 siblings, 0 replies; 25+ messages in thread
From: Dimitri Sivanich @ 2012-05-24 1:40 UTC (permalink / raw)
To: Suresh Siddha
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Wed, May 23, 2012 at 04:49:29PM -0700, Suresh Siddha wrote:
> On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > OK. Hopefully this covers it.
>
> Sorry No. Now you will understand why Thomas wanted detailed changelog.
> I found one more issue with the help of your new modification to the
> changelog.
> > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> >
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.
> >
> > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > irq, but panic when accessing irq_cfg.
> >
> > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
>
> So, what happens if the irq_desc gets freed by the destroy_irq() in the
> sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> irq desc memory! Right?
I don't believe I ever hit that case, but yes, I see the possibility, so
we have at least 3 holes here.
My test (other than repeated boots) was to repeatedly insert and remove a
module that sets up irq's on every cpu. Without my patch it panic'ed fairly
quickly. With the patch it never failed.
>
> May we should really do something like the appended (untested patch)?
> Can you please review and give this a try? Let me review a bit more to
> see if this really fixes the issue.
I wonder how many instances of smp_irq_move_cleanup_interrupt() might
run simultaneously on a large system? Would there be any issue with lock
pressure from holding the vector_lock through the entire loop?
I can probably test the code (after further review), but the only case I
saw through testing was the create_irq_nr case.
>
> Thanks.
> ---
> arch/x86/kernel/apic/io_apic.c | 19 +++++++++----------
> 1 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index ffdc152..81f4cab 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2295,6 +2295,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> exit_idle();
>
> me = smp_processor_id();
> +
> + raw_spin_lock(&vector_lock);
> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> unsigned int irq;
> unsigned int irr;
> @@ -2310,17 +2312,16 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> continue;
>
> cfg = irq_cfg(irq);
> - raw_spin_lock(&desc->lock);
>
> /*
> * Check if the irq migration is in progress. If so, we
> * haven't received the cleanup request yet for this irq.
> */
> if (cfg->move_in_progress)
> - goto unlock;
> + continue;
>
> if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
> - goto unlock;
> + continue;
>
> irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
> /*
> @@ -2332,12 +2333,11 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> */
> if (irr & (1 << (vector % 32))) {
> apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
> - goto unlock;
> + continue;
> }
> __this_cpu_write(vector_irq[vector], -1);
> -unlock:
> - raw_spin_unlock(&desc->lock);
> }
> + raw_spin_unlock(&vector_lock);
>
> irq_exit();
> }
> @@ -2986,17 +2986,16 @@ unsigned int create_irq_nr(unsigned int from, int node)
> return 0;
> }
>
> + irq_set_chip_data(irq, cfg);
> raw_spin_lock_irqsave(&vector_lock, flags);
> if (!__assign_irq_vector(irq, cfg, apic->target_cpus()))
> ret = irq;
> raw_spin_unlock_irqrestore(&vector_lock, flags);
>
> - if (ret) {
> - irq_set_chip_data(irq, cfg);
> + if (ret)
> irq_clear_status_flags(irq, IRQ_NOREQUEST);
> - } else {
> + else
> free_irq_at(irq, cfg);
> - }
> return ret;
> }
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-23 23:49 ` Suresh Siddha
2012-05-24 1:40 ` Dimitri Sivanich
@ 2012-05-24 14:37 ` Dimitri Sivanich
2012-05-24 18:19 ` Suresh Siddha
2012-05-24 14:53 ` Thomas Gleixner
2 siblings, 1 reply; 25+ messages in thread
From: Dimitri Sivanich @ 2012-05-24 14:37 UTC (permalink / raw)
To: Suresh Siddha
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Wed, May 23, 2012 at 04:49:29PM -0700, Suresh Siddha wrote:
> On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > OK. Hopefully this covers it.
>
> Sorry No. Now you will understand why Thomas wanted detailed changelog.
> I found one more issue with the help of your new modification to the
> changelog.
>
> > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> >
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.
> >
> > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > irq, but panic when accessing irq_cfg.
> >
> > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
>
> So, what happens if the irq_desc gets freed by the destroy_irq() in the
> sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> irq desc memory! Right?
And speaking of possible holes in destroy_irq()..
What happens if we're running __assign_irq_vector() (say we're changing irq
affinity), and on another cpu we had just run through __clear_irq_vector()
via destroy_irq(). Now destroy_irq() is going to call
free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
__assign_irq_vector goes to access irq_cfg (cfg->vector or
cfg->move_in_progress, for instance), which was already freed.
I'm not sure if this can happen, but just eyeballing it, it does look that
that way.
>
> May we should really do something like the appended (untested patch)?
> Can you please review and give this a try? Let me review a bit more to
> see if this really fixes the issue.
>
> Thanks.
> ---
> arch/x86/kernel/apic/io_apic.c | 19 +++++++++----------
> 1 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index ffdc152..81f4cab 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2295,6 +2295,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> exit_idle();
>
> me = smp_processor_id();
> +
> + raw_spin_lock(&vector_lock);
> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> unsigned int irq;
> unsigned int irr;
> @@ -2310,17 +2312,16 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> continue;
>
> cfg = irq_cfg(irq);
> - raw_spin_lock(&desc->lock);
>
> /*
> * Check if the irq migration is in progress. If so, we
> * haven't received the cleanup request yet for this irq.
> */
> if (cfg->move_in_progress)
> - goto unlock;
> + continue;
>
> if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
> - goto unlock;
> + continue;
>
> irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
> /*
> @@ -2332,12 +2333,11 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> */
> if (irr & (1 << (vector % 32))) {
> apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
> - goto unlock;
> + continue;
> }
> __this_cpu_write(vector_irq[vector], -1);
> -unlock:
> - raw_spin_unlock(&desc->lock);
> }
> + raw_spin_unlock(&vector_lock);
>
> irq_exit();
> }
> @@ -2986,17 +2986,16 @@ unsigned int create_irq_nr(unsigned int from, int node)
> return 0;
> }
>
> + irq_set_chip_data(irq, cfg);
> raw_spin_lock_irqsave(&vector_lock, flags);
> if (!__assign_irq_vector(irq, cfg, apic->target_cpus()))
> ret = irq;
> raw_spin_unlock_irqrestore(&vector_lock, flags);
>
> - if (ret) {
> - irq_set_chip_data(irq, cfg);
> + if (ret)
> irq_clear_status_flags(irq, IRQ_NOREQUEST);
> - } else {
> + else
> free_irq_at(irq, cfg);
> - }
> return ret;
> }
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-24 14:37 ` Dimitri Sivanich
@ 2012-05-24 18:19 ` Suresh Siddha
2012-05-24 19:16 ` Thomas Gleixner
0 siblings, 1 reply; 25+ messages in thread
From: Suresh Siddha @ 2012-05-24 18:19 UTC (permalink / raw)
To: Dimitri Sivanich
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Thu, 2012-05-24 at 09:37 -0500, Dimitri Sivanich wrote:
> And speaking of possible holes in destroy_irq()..
>
> What happens if we're running __assign_irq_vector() (say we're changing irq
> affinity), and on another cpu we had just run through __clear_irq_vector()
> via destroy_irq(). Now destroy_irq() is going to call
> free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
> __assign_irq_vector goes to access irq_cfg (cfg->vector or
> cfg->move_in_progress, for instance), which was already freed.
>
> I'm not sure if this can happen, but just eyeballing it, it does look that
> that way.
>
I wanted to say, irq desc is locked when we change the irq affinity,
which calls assign_irq_vector() and friends, so this should be fine.
BUT NO. I don't see any reference counts being maintained when we do
irq_to_desc(). So locking/unlocking that desc pointer is bogus when
destroy_irq() can go ahead and free the desc in parallel.
So, SPARSE_IRQ looks terribly broken! Yinghai, Thomas?
thanks,
suresh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-24 18:19 ` Suresh Siddha
@ 2012-05-24 19:16 ` Thomas Gleixner
2012-05-26 0:23 ` Suresh Siddha
0 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2012-05-24 19:16 UTC (permalink / raw)
To: Suresh Siddha
Cc: Dimitri Sivanich, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Thu, 24 May 2012, Suresh Siddha wrote:
> On Thu, 2012-05-24 at 09:37 -0500, Dimitri Sivanich wrote:
> > And speaking of possible holes in destroy_irq()..
> >
> > What happens if we're running __assign_irq_vector() (say we're changing irq
> > affinity), and on another cpu we had just run through __clear_irq_vector()
> > via destroy_irq(). Now destroy_irq() is going to call
> > free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
> > __assign_irq_vector goes to access irq_cfg (cfg->vector or
> > cfg->move_in_progress, for instance), which was already freed.
> >
> > I'm not sure if this can happen, but just eyeballing it, it does look that
> > that way.
> >
>
> I wanted to say, irq desc is locked when we change the irq affinity,
> which calls assign_irq_vector() and friends, so this should be fine.
>
> BUT NO. I don't see any reference counts being maintained when we do
> irq_to_desc(). So locking/unlocking that desc pointer is bogus when
> destroy_irq() can go ahead and free the desc in parallel.
>
> So, SPARSE_IRQ looks terribly broken! Yinghai, Thomas?
Yes, we need refcounts for that. We talked about that before, but then
the argument was against it was that all that code is serialized
already, so no need. How wrong :)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-24 19:16 ` Thomas Gleixner
@ 2012-05-26 0:23 ` Suresh Siddha
2012-05-26 10:18 ` Thomas Gleixner
0 siblings, 1 reply; 25+ messages in thread
From: Suresh Siddha @ 2012-05-26 0:23 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Dimitri Sivanich, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
> On Thu, 24 May 2012, Suresh Siddha wrote:
> > On Thu, 2012-05-24 at 09:37 -0500, Dimitri Sivanich wrote:
> > > And speaking of possible holes in destroy_irq()..
> > >
> > > What happens if we're running __assign_irq_vector() (say we're changing irq
> > > affinity), and on another cpu we had just run through __clear_irq_vector()
> > > via destroy_irq(). Now destroy_irq() is going to call
> > > free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
> > > __assign_irq_vector goes to access irq_cfg (cfg->vector or
> > > cfg->move_in_progress, for instance), which was already freed.
> > >
> > > I'm not sure if this can happen, but just eyeballing it, it does look that
> > > that way.
> > >
> >
> > I wanted to say, irq desc is locked when we change the irq affinity,
> > which calls assign_irq_vector() and friends, so this should be fine.
> >
> > BUT NO. I don't see any reference counts being maintained when we do
> > irq_to_desc(). So locking/unlocking that desc pointer is bogus when
> > destroy_irq() can go ahead and free the desc in parallel.
> >
> > So, SPARSE_IRQ looks terribly broken! Yinghai, Thomas?
>
> Yes, we need refcounts for that. We talked about that before, but then
> the argument was against it was that all that code is serialized
> already, so no need. How wrong :)
I looked a bit more at this and it looks like unregister_handler_proc()
(that gets called from the free_irq path) will ensure that all the
existing proc writers modifying the irq affinity from
the /proc/irq/N/smp_affinity interface will be completed. So Dimitri,
the code path changing irq affinity looks fine.
We also do synchronize_irq() in __free_irq(). Followed by
masking/shutdown the irq etc and the destroy_irq() which happens after
all this is probably getting lucky because of timing reasons (any other
cpu handling that irq would have completed the irq handling and the
corresponding reference to the 'desc' etc). But we should be also doing
synchronize_rcu()/synchronize_sched() to ensure that other cpu's are not
in the irq handling paths having a reference to the 'desc'.
There are other (not-so common) irq desc references, like in the
show_interrupts() (cat /proc/interrupts path) etc, that does things like
this in the process context:
desc = irq_to_desc(i);
if (!desc)
return 0;
raw_spin_lock_irqsave(&desc->lock, flags);
May be we should introduce something like
get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
irq desc with pre-emption/irq's disabled and lock it etc. And the
synchronize_sched() will enable the destroy_irq()/free_desc() to free it
safely etc.
thanks,
suresh
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-26 0:23 ` Suresh Siddha
@ 2012-05-26 10:18 ` Thomas Gleixner
2012-05-27 1:41 ` Jiang Liu
2012-05-30 13:46 ` Dimitri Sivanich
0 siblings, 2 replies; 25+ messages in thread
From: Thomas Gleixner @ 2012-05-26 10:18 UTC (permalink / raw)
To: Suresh Siddha
Cc: Dimitri Sivanich, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Fri, 25 May 2012, Suresh Siddha wrote:
> On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
> There are other (not-so common) irq desc references, like in the
> show_interrupts() (cat /proc/interrupts path) etc, that does things like
> this in the process context:
>
> desc = irq_to_desc(i);
> if (!desc)
> return 0;
>
> raw_spin_lock_irqsave(&desc->lock, flags);
>
> May be we should introduce something like
> get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
> irq desc with pre-emption/irq's disabled and lock it etc. And the
> synchronize_sched() will enable the destroy_irq()/free_desc() to free it
> safely etc.
I want to avoid that and instead use proper refcounting. The reason is
that we want to move the irq descriptor when the affinity changes
nodes, and for that we need refcounting anyway.
Thanks,
tglx
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-26 10:18 ` Thomas Gleixner
@ 2012-05-27 1:41 ` Jiang Liu
2012-05-30 13:46 ` Dimitri Sivanich
1 sibling, 0 replies; 25+ messages in thread
From: Jiang Liu @ 2012-05-27 1:41 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Suresh Siddha, Dimitri Sivanich, Ingo Molnar, H. Peter Anvin, x86,
Yinghai Lu, Naga Chumbalkar, Jacob Pan, linux-kernel
I have realized the same issue with typical usages of for_each_irq_desc(),
which may access freed memory with SPARSE_IRQ. My naive solution was to
avoid freeing irq_desc even SPARSE_IRQ_ is enabled.
for_each_irq_desc(i, desc) {
raw_spin_lock_irq(&desc->lock);
On 05/26/2012 06:18 PM, Thomas Gleixner wrote:
> On Fri, 25 May 2012, Suresh Siddha wrote:
>> On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
>> There are other (not-so common) irq desc references, like in the
>> show_interrupts() (cat /proc/interrupts path) etc, that does things like
>> this in the process context:
>>
>> desc = irq_to_desc(i);
>> if (!desc)
>> return 0;
>>
>> raw_spin_lock_irqsave(&desc->lock, flags);
>>
>> May be we should introduce something like
>> get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
>> irq desc with pre-emption/irq's disabled and lock it etc. And the
>> synchronize_sched() will enable the destroy_irq()/free_desc() to free it
>> safely etc.
>
> I want to avoid that and instead use proper refcounting. The reason is
> that we want to move the irq descriptor when the affinity changes
> nodes, and for that we need refcounting anyway.
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-26 10:18 ` Thomas Gleixner
2012-05-27 1:41 ` Jiang Liu
@ 2012-05-30 13:46 ` Dimitri Sivanich
1 sibling, 0 replies; 25+ messages in thread
From: Dimitri Sivanich @ 2012-05-30 13:46 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Suresh Siddha, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel, akpm, gregkh, stable
On Sat, May 26, 2012 at 12:18:21PM +0200, Thomas Gleixner wrote:
> On Fri, 25 May 2012, Suresh Siddha wrote:
> > On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
> > There are other (not-so common) irq desc references, like in the
> > show_interrupts() (cat /proc/interrupts path) etc, that does things like
> > this in the process context:
> >
> > desc = irq_to_desc(i);
> > if (!desc)
> > return 0;
> >
> > raw_spin_lock_irqsave(&desc->lock, flags);
> >
> > May be we should introduce something like
> > get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
> > irq desc with pre-emption/irq's disabled and lock it etc. And the
> > synchronize_sched() will enable the destroy_irq()/free_desc() to free it
> > safely etc.
>
> I want to avoid that and instead use proper refcounting. The reason is
> that we want to move the irq descriptor when the affinity changes
> nodes, and for that we need refcounting anyway.
>
While this proposal sounds good, in the meantime would there be any harm in
putting the NULL cfg check into smp_irq_move_cleanup_interrupt()?
It's a minimal change, and eliminates the panics that I've encountered thus far.
Reposting the patch.
A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
In create_irq_nr() there is a window where we have set vector_irq in
__assign_irq_vector(), but not yet called irq_set_chip_data() to set the
irq_cfg pointer.
Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
irq, but panic when accessing irq_cfg.
There is also a window in destroy_irq() where we've cleared the irq_cfg
pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
Only continue processing the irq if irq_cfg is non-NULL.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
arch/x86/kernel/apic/io_apic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Index: linux/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic/io_apic.c
+++ linux/arch/x86/kernel/apic/io_apic.c
@@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
if (!desc)
continue;
- cfg = irq_cfg(irq);
raw_spin_lock(&desc->lock);
+ cfg = irq_cfg(irq);
+ if (!cfg)
+ goto unlock;
+
/*
* Check if the irq migration is in progress. If so, we
* haven't received the cleanup request yet for this irq.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-23 23:49 ` Suresh Siddha
2012-05-24 1:40 ` Dimitri Sivanich
2012-05-24 14:37 ` Dimitri Sivanich
@ 2012-05-24 14:53 ` Thomas Gleixner
2012-05-24 15:36 ` Dimitri Sivanich
2 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2012-05-24 14:53 UTC (permalink / raw)
To: Suresh Siddha
Cc: Dimitri Sivanich, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Wed, 23 May 2012, Suresh Siddha wrote:
> On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > OK. Hopefully this covers it.
>
> Sorry No. Now you will understand why Thomas wanted detailed changelog.
> I found one more issue with the help of your new modification to the
> changelog.
>
> > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> >
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.
> >
> > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > irq, but panic when accessing irq_cfg.
> >
> > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
>
> So, what happens if the irq_desc gets freed by the destroy_irq() in the
> sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> irq desc memory! Right?
>
> May we should really do something like the appended (untested patch)?
> Can you please review and give this a try? Let me review a bit more to
> see if this really fixes the issue.
It's fixing the problem.
But this move_cleanup stuff could be made less stupid.
The check for irq_desc is superflous. irq_cfg() calls
irq_get_chip_data() which will return NULL if the irq descriptor is
not there.
To avoid the lookup business completely we should really store
irq_desc instead of the irq number in the per cpu vector array, that
would also get rid of the lookup in the irq delivery path.
Now that still needs to iterate over all vectors, but this could be
optimized in a second step.
In complete_move() we send the IPI to all cpus in the old mask. We
really should set the corresponding vector bit in a per cpu bitfield
on those cpus in the mask. The cleanup can rely on the bits and avoid
looking at 200+ vectors to find a single one.
Thoughts?
tglx
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
2012-05-24 14:53 ` Thomas Gleixner
@ 2012-05-24 15:36 ` Dimitri Sivanich
0 siblings, 0 replies; 25+ messages in thread
From: Dimitri Sivanich @ 2012-05-24 15:36 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Suresh Siddha, Ingo Molnar, H. Peter Anvin, x86, Yinghai Lu,
Naga Chumbalkar, Jacob Pan, linux-kernel
On Thu, May 24, 2012 at 04:53:06PM +0200, Thomas Gleixner wrote:
> On Wed, 23 May 2012, Suresh Siddha wrote:
> > On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > > OK. Hopefully this covers it.
> >
> > Sorry No. Now you will understand why Thomas wanted detailed changelog.
> > I found one more issue with the help of your new modification to the
> > changelog.
> >
> > > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> > >
> > > In create_irq_nr() there is a window where we have set vector_irq in
> > > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > > irq_cfg pointer.
> > >
> > > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > > irq, but panic when accessing irq_cfg.
> > >
> > > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
> >
> > So, what happens if the irq_desc gets freed by the destroy_irq() in the
> > sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> > irq desc memory! Right?
> >
> > May we should really do something like the appended (untested patch)?
> > Can you please review and give this a try? Let me review a bit more to
> > see if this really fixes the issue.
>
> It's fixing the problem.
>
> But this move_cleanup stuff could be made less stupid.
>
> The check for irq_desc is superflous. irq_cfg() calls
> irq_get_chip_data() which will return NULL if the irq descriptor is
> not there.
>
> To avoid the lookup business completely we should really store
> irq_desc instead of the irq number in the per cpu vector array, that
> would also get rid of the lookup in the irq delivery path.
So if the irq_desc gets deallocated you would clear all corresponding
per cpu vector_irq references before deallocation, protecting accesses
by smp_irq_move_cleanup_interrupt?
>
> Now that still needs to iterate over all vectors, but this could be
> optimized in a second step.
>
> In complete_move() we send the IPI to all cpus in the old mask. We
> really should set the corresponding vector bit in a per cpu bitfield
> on those cpus in the mask. The cleanup can rely on the bits and avoid
> looking at 200+ vectors to find a single one.
This part does sound more efficient at first glance.
>
> Thoughts?
>
> tglx
^ permalink raw reply [flat|nested] 25+ messages in thread