* RCU can use cpu_active_map?
@ 2009-02-09 20:13 Alex Chiang
2009-02-09 21:39 ` Paul E. McKenney
0 siblings, 1 reply; 3+ messages in thread
From: Alex Chiang @ 2009-02-09 20:13 UTC (permalink / raw)
To: tony.luck, Paul E. McKenney, linux-ia64, linux-kernel
Paul,
I don't pretend to understand RCU, but a very quick and naive
look through rcupreempt.c makes me think that we could use the
cpu_active_map instead of cpu_online_map?
cpu_active_map was introduced by e761b772.
In the CPU hotplug path, we touch the cpu_active_map very early
on:
int __ref cpu_down(unsigned int cpu)
{
int err;
err = stop_machine_create();
if (err)
return err;
cpu_maps_update_begin();
if (cpu_hotplug_disabled) {
err = -EBUSY;
goto out;
}
cpu_clear(cpu, cpu_active_map);
/* ... */
synchronize_sched();
err = _cpu_down(cpu, 0);
if (cpu_online(cpu))
cpu_set(cpu, cpu_active_map);
out:
cpu_maps_update_done();
stop_machine_destroy();
return err;
}
The call to _cpu_down() is where we eventually get to the code
that my patch below touches, so you can see that we mark the CPU
as !active before we ever get to the step of migrating interrupts
(which relies on cpu_online_map).
If RCU looked at cpu_active_map instead of cpu_online_map, it
seems like we would avoid the potential race situation you
mentioned earlier.
Alternatively, I could explore just playing with the ia64
interrupt migration code to use cpu_active_mask instead, but
wanted to get your thoughts from the RCU perspective.
Thanks.
/ac
* Alex Chiang <achiang@hp.com>:
> This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
>
> Commit e7b14036 removes the targetted disabled CPU from the
> cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
>
> Paul McKenney states that the reasoning behind the patch was to
> prevent irq handlers from running on CPUs marked offline because:
>
> RCU happily ignores CPUs that don't have their bits set in
> cpu_online_map, so if there are RCU read-side critical sections
> in the irq handlers being run, RCU will ignore them. If the
> other CPUs were running, they might sequence through the RCU
> state machine, which could result in data structures being
> yanked out from under those irq handlers, which in turn could
> result in oopses or worse.
>
> Unfortunately, both ia64 functions above look at cpu_online_map to find
> a new CPU to migrate interrupts onto. This means we can potentially
> migrate an interrupt off ourself back to... ourself. Uh oh.
>
> This causes an oops when we finally try to process pending interrupts on
> the CPU we want to disable. The oops results from calling __do_IRQ with
> a NULL pt_regs:
>
> Unable to handle kernel NULL pointer dereference (address 0000000000000040)
> Call Trace:
> [<a000000100016930>] show_stack+0x50/0xa0
> sp=e0000009c922fa00 bsp=e0000009c92214d0
> [<a0000001000171a0>] show_regs+0x820/0x860
> sp=e0000009c922fbd0 bsp=e0000009c9221478
> [<a00000010003c700>] die+0x1a0/0x2e0
> sp=e0000009c922fbd0 bsp=e0000009c9221438
> [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
> sp=e0000009c922fbd0 bsp=e0000009c92213d8
> [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
> sp=e0000009c922fc60 bsp=e0000009c92213d8
> [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
> sp=e0000009c922fe30 bsp=e0000009c9221398
> [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
> sp=e0000009c922fe30 bsp=e0000009c9221330
> [<a00000010013a800>] handle_IRQ_event+0x80/0x120
> sp=e0000009c922fe30 bsp=e0000009c92212f8
> [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
> sp=e0000009c922fe30 bsp=e0000009c9221290
> [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
> sp=e0000009c922fe30 bsp=e0000009c9221208
> [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
> sp=e0000009c922fe30 bsp=e0000009c92211a8
> [<a00000010005bd80>] __cpu_disable+0x140/0x240
> sp=e0000009c922fe30 bsp=e0000009c9221168
> [<a0000001006c5870>] take_cpu_down+0x50/0xa0
> sp=e0000009c922fe30 bsp=e0000009c9221148
> [<a000000100122610>] stop_cpu+0xd0/0x200
> sp=e0000009c922fe30 bsp=e0000009c92210f0
> [<a0000001000e0440>] kthread+0xc0/0x140
> sp=e0000009c922fe30 bsp=e0000009c92210c8
> [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
> sp=e0000009c922fe30 bsp=e0000009c92210a0
> [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
> sp=e0000009c922fe30 bsp=e0000009c92210a0
>
> I don't like this revert because it is fragile. ia64 is getting lucky
> because we seem to only ever process timer interrupts in this path, but
> if we ever race with an IPI here, we definitely use RCU and have the
> potential of hitting an oops that Paul describes above.
>
> Patching ia64's timer_interrupt() to check for NULL pt_regs is
> insufficient though, as we still hit the above oops.
>
> As a short term solution, I do think that this revert is the right
> answer. The revert hold up under repeated testing (24+ hour test runs)
> with this setup:
>
> - 8-way rx6600
> - randomly toggling CPU online/offline state every 2 seconds
> - running CPU exercisers, memory hog, disk exercisers, and
> network stressors
> - average system load around ~160
>
> In the long term, we really need to figure out why we set pt_regs = NULL
> in ia64_process_pending_intr(). If it turns out that it is unnecessary
> to do so, then we could safely re-introduce e7b14036 (along with some
> other logic to be smarter about migrating interrupts).
>
> One final note: x86 also removes the disabled CPU from cpu_online_map
> and then re-enables interrupts for 1ms, presumably to handle any pending
> interrupts:
>
> arch/x86/kernel/irq_32.c (and irq_64.c):
> cpu_disable_common:
> [remove cpu from cpu_online_map]
>
> fixup_irqs():
> for_each_irq:
> [break CPU affinities]
>
> local_irq_enable();
> mdelay(1);
> local_irq_disable();
>
> So they are doing implicitly what ia64 is doing explicitly.
>
> Cc: stable@kernel.org
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Alex Chiang <achiang@hp.com>
> ---
> arch/ia64/kernel/smpboot.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
> index 1146399..2ec5bbf 100644
> --- a/arch/ia64/kernel/smpboot.c
> +++ b/arch/ia64/kernel/smpboot.c
> @@ -736,14 +736,16 @@ int __cpu_disable(void)
> return -EBUSY;
> }
>
> + cpu_clear(cpu, cpu_online_map);
> +
> if (migrate_platform_irqs(cpu)) {
> cpu_set(cpu, cpu_online_map);
> return (-EBUSY);
> }
>
> remove_siblinginfo(cpu);
> - fixup_irqs();
> cpu_clear(cpu, cpu_online_map);
> + fixup_irqs();
> local_flush_tlb_all();
> cpu_clear(cpu, cpu_callin_map);
> return 0;
> --
> 1.6.0.1.161.g7f314
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: RCU can use cpu_active_map?
2009-02-09 20:13 RCU can use cpu_active_map? Alex Chiang
@ 2009-02-09 21:39 ` Paul E. McKenney
2009-02-09 22:50 ` Alex Chiang
0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2009-02-09 21:39 UTC (permalink / raw)
To: Alex Chiang, tony.luck, linux-ia64, linux-kernel
On Mon, Feb 09, 2009 at 01:13:45PM -0700, Alex Chiang wrote:
> Paul,
>
> I don't pretend to understand RCU, but a very quick and naive
> look through rcupreempt.c makes me think that we could use the
> cpu_active_map instead of cpu_online_map?
>
> cpu_active_map was introduced by e761b772.
>
> In the CPU hotplug path, we touch the cpu_active_map very early
> on:
>
> int __ref cpu_down(unsigned int cpu)
> {
> int err;
> err = stop_machine_create();
> if (err)
> return err;
> cpu_maps_update_begin();
>
> if (cpu_hotplug_disabled) {
> err = -EBUSY;
> goto out;
> }
>
> cpu_clear(cpu, cpu_active_map);
> /* ... */
> synchronize_sched();
> err = _cpu_down(cpu, 0);
> if (cpu_online(cpu))
> cpu_set(cpu, cpu_active_map);
>
> out:
> cpu_maps_update_done();
> stop_machine_destroy();
> return err;
> }
>
> The call to _cpu_down() is where we eventually get to the code
> that my patch below touches, so you can see that we mark the CPU
> as !active before we ever get to the step of migrating interrupts
> (which relies on cpu_online_map).
>
> If RCU looked at cpu_active_map instead of cpu_online_map, it
> seems like we would avoid the potential race situation you
> mentioned earlier.
>
> Alternatively, I could explore just playing with the ia64
> interrupt migration code to use cpu_active_mask instead, but
> wanted to get your thoughts from the RCU perspective.
Perhaps I am confused, but if the CPU is on its way down, doesn't RCU
need a mask where the CPU's bit stays set longer rather than shorter?
If I use cpu_active_mask, couldn't there be device interrupts during
(for example) the synchronize_sched(), which might have RCU read-side
critical sections that RCU needs to pay attention to?
Thanx, Paul
> Thanks.
>
> /ac
>
>
> * Alex Chiang <achiang@hp.com>:
> > This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> >
> > Commit e7b14036 removes the targetted disabled CPU from the
> > cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
> >
> > Paul McKenney states that the reasoning behind the patch was to
> > prevent irq handlers from running on CPUs marked offline because:
> >
> > RCU happily ignores CPUs that don't have their bits set in
> > cpu_online_map, so if there are RCU read-side critical sections
> > in the irq handlers being run, RCU will ignore them. If the
> > other CPUs were running, they might sequence through the RCU
> > state machine, which could result in data structures being
> > yanked out from under those irq handlers, which in turn could
> > result in oopses or worse.
> >
> > Unfortunately, both ia64 functions above look at cpu_online_map to find
> > a new CPU to migrate interrupts onto. This means we can potentially
> > migrate an interrupt off ourself back to... ourself. Uh oh.
> >
> > This causes an oops when we finally try to process pending interrupts on
> > the CPU we want to disable. The oops results from calling __do_IRQ with
> > a NULL pt_regs:
> >
> > Unable to handle kernel NULL pointer dereference (address 0000000000000040)
> > Call Trace:
> > [<a000000100016930>] show_stack+0x50/0xa0
> > sp=e0000009c922fa00 bsp=e0000009c92214d0
> > [<a0000001000171a0>] show_regs+0x820/0x860
> > sp=e0000009c922fbd0 bsp=e0000009c9221478
> > [<a00000010003c700>] die+0x1a0/0x2e0
> > sp=e0000009c922fbd0 bsp=e0000009c9221438
> > [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
> > sp=e0000009c922fbd0 bsp=e0000009c92213d8
> > [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
> > sp=e0000009c922fc60 bsp=e0000009c92213d8
> > [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
> > sp=e0000009c922fe30 bsp=e0000009c9221398
> > [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
> > sp=e0000009c922fe30 bsp=e0000009c9221330
> > [<a00000010013a800>] handle_IRQ_event+0x80/0x120
> > sp=e0000009c922fe30 bsp=e0000009c92212f8
> > [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
> > sp=e0000009c922fe30 bsp=e0000009c9221290
> > [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
> > sp=e0000009c922fe30 bsp=e0000009c9221208
> > [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
> > sp=e0000009c922fe30 bsp=e0000009c92211a8
> > [<a00000010005bd80>] __cpu_disable+0x140/0x240
> > sp=e0000009c922fe30 bsp=e0000009c9221168
> > [<a0000001006c5870>] take_cpu_down+0x50/0xa0
> > sp=e0000009c922fe30 bsp=e0000009c9221148
> > [<a000000100122610>] stop_cpu+0xd0/0x200
> > sp=e0000009c922fe30 bsp=e0000009c92210f0
> > [<a0000001000e0440>] kthread+0xc0/0x140
> > sp=e0000009c922fe30 bsp=e0000009c92210c8
> > [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
> > sp=e0000009c922fe30 bsp=e0000009c92210a0
> > [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
> > sp=e0000009c922fe30 bsp=e0000009c92210a0
> >
> > I don't like this revert because it is fragile. ia64 is getting lucky
> > because we seem to only ever process timer interrupts in this path, but
> > if we ever race with an IPI here, we definitely use RCU and have the
> > potential of hitting an oops that Paul describes above.
> >
> > Patching ia64's timer_interrupt() to check for NULL pt_regs is
> > insufficient though, as we still hit the above oops.
> >
> > As a short term solution, I do think that this revert is the right
> > answer. The revert hold up under repeated testing (24+ hour test runs)
> > with this setup:
> >
> > - 8-way rx6600
> > - randomly toggling CPU online/offline state every 2 seconds
> > - running CPU exercisers, memory hog, disk exercisers, and
> > network stressors
> > - average system load around ~160
> >
> > In the long term, we really need to figure out why we set pt_regs = NULL
> > in ia64_process_pending_intr(). If it turns out that it is unnecessary
> > to do so, then we could safely re-introduce e7b14036 (along with some
> > other logic to be smarter about migrating interrupts).
> >
> > One final note: x86 also removes the disabled CPU from cpu_online_map
> > and then re-enables interrupts for 1ms, presumably to handle any pending
> > interrupts:
> >
> > arch/x86/kernel/irq_32.c (and irq_64.c):
> > cpu_disable_common:
> > [remove cpu from cpu_online_map]
> >
> > fixup_irqs():
> > for_each_irq:
> > [break CPU affinities]
> >
> > local_irq_enable();
> > mdelay(1);
> > local_irq_disable();
> >
> > So they are doing implicitly what ia64 is doing explicitly.
> >
> > Cc: stable@kernel.org
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Alex Chiang <achiang@hp.com>
> > ---
> > arch/ia64/kernel/smpboot.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
> > index 1146399..2ec5bbf 100644
> > --- a/arch/ia64/kernel/smpboot.c
> > +++ b/arch/ia64/kernel/smpboot.c
> > @@ -736,14 +736,16 @@ int __cpu_disable(void)
> > return -EBUSY;
> > }
> >
> > + cpu_clear(cpu, cpu_online_map);
> > +
> > if (migrate_platform_irqs(cpu)) {
> > cpu_set(cpu, cpu_online_map);
> > return (-EBUSY);
> > }
> >
> > remove_siblinginfo(cpu);
> > - fixup_irqs();
> > cpu_clear(cpu, cpu_online_map);
> > + fixup_irqs();
> > local_flush_tlb_all();
> > cpu_clear(cpu, cpu_callin_map);
> > return 0;
> > --
> > 1.6.0.1.161.g7f314
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: RCU can use cpu_active_map?
2009-02-09 21:39 ` Paul E. McKenney
@ 2009-02-09 22:50 ` Alex Chiang
0 siblings, 0 replies; 3+ messages in thread
From: Alex Chiang @ 2009-02-09 22:50 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: tony.luck, linux-ia64, linux-kernel
* Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Mon, Feb 09, 2009 at 01:13:45PM -0700, Alex Chiang wrote:
> > Paul,
> >
> > I don't pretend to understand RCU, but a very quick and naive
> > look through rcupreempt.c makes me think that we could use the
> > cpu_active_map instead of cpu_online_map?
> >
> > cpu_active_map was introduced by e761b772.
> >
> > In the CPU hotplug path, we touch the cpu_active_map very early
> > on:
> >
> > int __ref cpu_down(unsigned int cpu)
> > {
> > int err;
> > err = stop_machine_create();
> > if (err)
> > return err;
> > cpu_maps_update_begin();
> >
> > if (cpu_hotplug_disabled) {
> > err = -EBUSY;
> > goto out;
> > }
> >
> > cpu_clear(cpu, cpu_active_map);
> > /* ... */
> > synchronize_sched();
> > err = _cpu_down(cpu, 0);
> > if (cpu_online(cpu))
> > cpu_set(cpu, cpu_active_map);
> >
> > out:
> > cpu_maps_update_done();
> > stop_machine_destroy();
> > return err;
> > }
> >
> > The call to _cpu_down() is where we eventually get to the code
> > that my patch below touches, so you can see that we mark the CPU
> > as !active before we ever get to the step of migrating interrupts
> > (which relies on cpu_online_map).
> >
> > If RCU looked at cpu_active_map instead of cpu_online_map, it
> > seems like we would avoid the potential race situation you
> > mentioned earlier.
> >
> > Alternatively, I could explore just playing with the ia64
> > interrupt migration code to use cpu_active_mask instead, but
> > wanted to get your thoughts from the RCU perspective.
>
> Perhaps I am confused, but if the CPU is on its way down, doesn't RCU
> need a mask where the CPU's bit stays set longer rather than shorter?
>
> If I use cpu_active_mask, couldn't there be device interrupts during
> (for example) the synchronize_sched(), which might have RCU read-side
> critical sections that RCU needs to pay attention to?
Hm, I think you're right.
Thanks.
/ac
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-02-09 22:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-09 20:13 RCU can use cpu_active_map? Alex Chiang
2009-02-09 21:39 ` Paul E. McKenney
2009-02-09 22:50 ` Alex Chiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox