From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 28 Jan 2013 11:20:51 +0100 From: Ingo Molnar To: Jan Beulich Cc: Linus Torvalds , Milton Miller , Wang YanQing , Mike Galbraith , Peter Zijlstra , Thomas Gleixner , Andrew Morton , "Srivatsa S. Bhat" , mina86@mina86.org, Linux Kernel Mailing List , stable Subject: Re: [PATCH]smp: Fix send func call IPI to empty cpu mask Message-ID: <20130128102051.GA20263@gmail.com> References: <20130126075357.GA3205@udknight> <20130127155043.GA6214@gmail.com> <5106523302000078000B9F0B@nat28.tlf.novell.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5106523302000078000B9F0B@nat28.tlf.novell.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: * Jan Beulich wrote: > >>> On 27.01.13 at 16:50, Ingo Molnar wrote: > > > * Linus Torvalds wrote: > > > >> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing wrote: > >> > I get below warning every day with 3.7, > >> > one or two times per day. > >> > > >> > [ 2235.186027] WARNING: at > > /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 > > default_send_IPI_mask_logical+0x2f/0xb8() > >> > [ 2235.186030] Hardware name: Aspire 4741 > >> > [ 2235.186032] empty IPI mask > >> > [ 2235.186079] [] native_send_call_func_ipi+0x4f/0x57 > >> > [ 2235.186087] [] smp_call_function_many+0x191/0x1a9 > >> > [ 2235.186097] [] native_flush_tlb_others+0x21/0x24 > >> > [ 2235.186101] [] flush_tlb_page+0x63/0x89 > >> > [ 2235.186105] [] ptep_set_access_flags+0x20/0x26 > >> > [ 2235.186111] [] do_wp_page+0x234/0x502 > >> > [ 2235.186121] [] handle_pte_fault+0x50d/0x54c > >> > [ 2235.186148] [] handle_mm_fault+0xd0/0xe2 > >> > [ 2235.186153] [] __do_page_fault+0x411/0x42d > >> > [ 2235.186166] [] do_page_fault+0x8/0xa > >> > [ 2235.186170] [] error_code+0x5a/0x60 > >> > > >> > This patch fix it. > >> > > >> > This patch also fix some system hang problem: > >> > If the data->cpumask been cleared after pass > >> > > >> > if (WARN_ONCE(!mask, "empty IPI mask")) > >> > return; > >> > then the problem 83d349f3 fix will happen again. > >> > >> Hmm. We have very consciously tried to avoid the extra copy, although > >> I'm not entirely sure why (it might possibly hurt on the MAXSMP > >> configuration). > >> > >> See for example commit 723aae25d5cd ("smp_call_function_many: handle > >> concurrent clearing of mask") which fixed another version of this > >> problem. > >> > >> But I do agree that it looks like the copy is required, simply because > >> - as you say - once we've done the "list_add_rcu()" to add it to the > >> queue, we can have (another) IPI to the target CPU that can now see it > >> and clear the mask. > >> > >> So by the time we get to actually send the IPI, the mask might have > >> been cleared by another IPI. So I do agree that your patch seems > >> correct, but I really really want to run it by other people. > >> > >> Guys? Original patch on lkml. The other possible fix might be > >> to take the &call_function.lock earlier in > >> generic_smp_call_function_interrupt(), so that we can never > >> clear the bit while somebody is adding entries to the list... > >> But I think it very much tries to avoid that on purpose right > >> now, with only the last CPU responding to that IPI taking the > >> lock. > >> > >> So copying the IPI mask seems to be the reasonable approach. > >> Comments? > > > > Agreed, looks correct to me as well - I've queued the fix up in > > tip:x86/urgent. > > But the patch is obviously incomplete for the CPUMASK_OFFSTACK > case, as the newly added cpumask_ipi member never gets its bit > array allocated. Yes, indeed - I'll amend it with the fix. Thanks, Ingo