From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754782AbZBIVjm (ORCPT ); Mon, 9 Feb 2009 16:39:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752970AbZBIVjc (ORCPT ); Mon, 9 Feb 2009 16:39:32 -0500 Received: from e1.ny.us.ibm.com ([32.97.182.141]:39645 "EHLO e1.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752245AbZBIVjb (ORCPT ); Mon, 9 Feb 2009 16:39:31 -0500 Date: Mon, 9 Feb 2009 13:39:29 -0800 From: "Paul E. McKenney" To: Alex Chiang , tony.luck@intel.com, linux-ia64@vger.kernel.org, linux-kernel Subject: Re: RCU can use cpu_active_map? Message-ID: <20090209213929.GQ6802@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20090209201345.GB13107@ldl.fc.hp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090209201345.GB13107@ldl.fc.hp.com> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 09, 2009 at 01:13:45PM -0700, Alex Chiang wrote: > Paul, > > I don't pretend to understand RCU, but a very quick and naive > look through rcupreempt.c makes me think that we could use the > cpu_active_map instead of cpu_online_map? > > cpu_active_map was introduced by e761b772. > > In the CPU hotplug path, we touch the cpu_active_map very early > on: > > int __ref cpu_down(unsigned int cpu) > { > int err; > err = stop_machine_create(); > if (err) > return err; > cpu_maps_update_begin(); > > if (cpu_hotplug_disabled) { > err = -EBUSY; > goto out; > } > > cpu_clear(cpu, cpu_active_map); > /* ... */ > synchronize_sched(); > err = _cpu_down(cpu, 0); > if (cpu_online(cpu)) > cpu_set(cpu, cpu_active_map); > > out: > cpu_maps_update_done(); > stop_machine_destroy(); > return err; > } > > The call to _cpu_down() is where we eventually get to the code > that my patch below touches, so you can see that we mark the CPU > as !active before we ever get to the step of migrating interrupts > (which relies on cpu_online_map). > > If RCU looked at cpu_active_map instead of cpu_online_map, it > seems like we would avoid the potential race situation you > mentioned earlier. > > Alternatively, I could explore just playing with the ia64 > interrupt migration code to use cpu_active_mask instead, but > wanted to get your thoughts from the RCU perspective. Perhaps I am confused, but if the CPU is on its way down, doesn't RCU need a mask where the CPU's bit stays set longer rather than shorter? If I use cpu_active_mask, couldn't there be device interrupts during (for example) the synchronize_sched(), which might have RCU read-side critical sections that RCU needs to pay attention to? Thanx, Paul > Thanks. > > /ac > > > * Alex Chiang : > > This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a. > > > > Commit e7b14036 removes the targetted disabled CPU from the > > cpu_online_map after calls to migrate_platform_irqs and fixup_irqs. > > > > Paul McKenney states that the reasoning behind the patch was to > > prevent irq handlers from running on CPUs marked offline because: > > > > RCU happily ignores CPUs that don't have their bits set in > > cpu_online_map, so if there are RCU read-side critical sections > > in the irq handlers being run, RCU will ignore them. If the > > other CPUs were running, they might sequence through the RCU > > state machine, which could result in data structures being > > yanked out from under those irq handlers, which in turn could > > result in oopses or worse. > > > > Unfortunately, both ia64 functions above look at cpu_online_map to find > > a new CPU to migrate interrupts onto. This means we can potentially > > migrate an interrupt off ourself back to... ourself. Uh oh. > > > > This causes an oops when we finally try to process pending interrupts on > > the CPU we want to disable. The oops results from calling __do_IRQ with > > a NULL pt_regs: > > > > Unable to handle kernel NULL pointer dereference (address 0000000000000040) > > Call Trace: > > [] show_stack+0x50/0xa0 > > sp=e0000009c922fa00 bsp=e0000009c92214d0 > > [] show_regs+0x820/0x860 > > sp=e0000009c922fbd0 bsp=e0000009c9221478 > > [] die+0x1a0/0x2e0 > > sp=e0000009c922fbd0 bsp=e0000009c9221438 > > [] ia64_do_page_fault+0x950/0xa80 > > sp=e0000009c922fbd0 bsp=e0000009c92213d8 > > [] ia64_native_leave_kernel+0x0/0x270 > > sp=e0000009c922fc60 bsp=e0000009c92213d8 > > [] profile_tick+0xd0/0x1c0 > > sp=e0000009c922fe30 bsp=e0000009c9221398 > > [] timer_interrupt+0x170/0x3e0 > > sp=e0000009c922fe30 bsp=e0000009c9221330 > > [] handle_IRQ_event+0x80/0x120 > > sp=e0000009c922fe30 bsp=e0000009c92212f8 > > [] __do_IRQ+0x160/0x4a0 > > sp=e0000009c922fe30 bsp=e0000009c9221290 > > [] ia64_process_pending_intr+0x2b0/0x360 > > sp=e0000009c922fe30 bsp=e0000009c9221208 > > [] fixup_irqs+0xf0/0x2a0 > > sp=e0000009c922fe30 bsp=e0000009c92211a8 > > [] __cpu_disable+0x140/0x240 > > sp=e0000009c922fe30 bsp=e0000009c9221168 > > [] take_cpu_down+0x50/0xa0 > > sp=e0000009c922fe30 bsp=e0000009c9221148 > > [] stop_cpu+0xd0/0x200 > > sp=e0000009c922fe30 bsp=e0000009c92210f0 > > [] kthread+0xc0/0x140 > > sp=e0000009c922fe30 bsp=e0000009c92210c8 > > [] kernel_thread_helper+0xd0/0x100 > > sp=e0000009c922fe30 bsp=e0000009c92210a0 > > [] start_kernel_thread+0x20/0x40 > > sp=e0000009c922fe30 bsp=e0000009c92210a0 > > > > I don't like this revert because it is fragile. ia64 is getting lucky > > because we seem to only ever process timer interrupts in this path, but > > if we ever race with an IPI here, we definitely use RCU and have the > > potential of hitting an oops that Paul describes above. > > > > Patching ia64's timer_interrupt() to check for NULL pt_regs is > > insufficient though, as we still hit the above oops. > > > > As a short term solution, I do think that this revert is the right > > answer. The revert hold up under repeated testing (24+ hour test runs) > > with this setup: > > > > - 8-way rx6600 > > - randomly toggling CPU online/offline state every 2 seconds > > - running CPU exercisers, memory hog, disk exercisers, and > > network stressors > > - average system load around ~160 > > > > In the long term, we really need to figure out why we set pt_regs = NULL > > in ia64_process_pending_intr(). If it turns out that it is unnecessary > > to do so, then we could safely re-introduce e7b14036 (along with some > > other logic to be smarter about migrating interrupts). > > > > One final note: x86 also removes the disabled CPU from cpu_online_map > > and then re-enables interrupts for 1ms, presumably to handle any pending > > interrupts: > > > > arch/x86/kernel/irq_32.c (and irq_64.c): > > cpu_disable_common: > > [remove cpu from cpu_online_map] > > > > fixup_irqs(): > > for_each_irq: > > [break CPU affinities] > > > > local_irq_enable(); > > mdelay(1); > > local_irq_disable(); > > > > So they are doing implicitly what ia64 is doing explicitly. > > > > Cc: stable@kernel.org > > Cc: Paul E. McKenney > > Signed-off-by: Alex Chiang > > --- > > arch/ia64/kernel/smpboot.c | 4 +++- > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c > > index 1146399..2ec5bbf 100644 > > --- a/arch/ia64/kernel/smpboot.c > > +++ b/arch/ia64/kernel/smpboot.c > > @@ -736,14 +736,16 @@ int __cpu_disable(void) > > return -EBUSY; > > } > > > > + cpu_clear(cpu, cpu_online_map); > > + > > if (migrate_platform_irqs(cpu)) { > > cpu_set(cpu, cpu_online_map); > > return (-EBUSY); > > } > > > > remove_siblinginfo(cpu); > > - fixup_irqs(); > > cpu_clear(cpu, cpu_online_map); > > + fixup_irqs(); > > local_flush_tlb_all(); > > cpu_clear(cpu, cpu_callin_map); > > return 0; > > -- > > 1.6.0.1.161.g7f314 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html