From: Hariprasad Nellitheertha <hari@in.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Randy.Dunlap" <rddunlap@osdl.org>,
r3pek@r3pek.homelinux.org, fastboot@lists.osdl.org,
linux-kernel@vger.kernel.org
Subject: Re: [Fastboot] Re: kexec "problem" [and patch updates]
Date: Thu, 4 Mar 2004 18:33:10 +0530 [thread overview]
Message-ID: <20040304130310.GA7741@in.ibm.com> (raw)
In-Reply-To: <m1brnjcwpu.fsf@ebiederm.dsl.xmission.com>
Hello,
I recreated this on a UNI system running an SMP kernel as well.
The problem is because we now initialize cpu_vm_mask for init_mm with
CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1.
Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set
cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations
for ppc64. On UNI kernels, CPU_MASK_ALL is 1 and hence the problem
does not occur.
I made a small patch which fixes this problem. The change is, essentially,
to use "tmp" instead of "cpumask". This ensures that only the (other) online
cpus are sent the IPI.
I have done some testing with this patch. Kexec loads fine and I haven't seen
anything untoward.
Comments please.
Regards, Hari
diff -Naur linux-2.6.3-before/arch/i386/kernel/smp.c linux-2.6.3/arch/i386/kernel/smp.c
--- linux-2.6.3-before/arch/i386/kernel/smp.c 2004-02-18 09:27:15.000000000 +0530
+++ linux-2.6.3/arch/i386/kernel/smp.c 2004-03-04 14:16:43.000000000 +0530
@@ -356,7 +356,8 @@
BUG_ON(cpus_empty(cpumask));
cpus_and(tmp, cpumask, cpu_online_map);
- BUG_ON(!cpus_equal(cpumask, tmp));
+ if(cpus_empty(tmp))
+ return;
BUG_ON(cpu_isset(smp_processor_id(), cpumask));
BUG_ON(!mm);
@@ -371,12 +372,12 @@
flush_mm = mm;
flush_va = va;
#if NR_CPUS <= BITS_PER_LONG
- atomic_set_mask(cpumask, &flush_cpumask);
+ atomic_set_mask(tmp, &flush_cpumask);
#else
{
int k;
unsigned long *flush_mask = (unsigned long *)&flush_cpumask;
- unsigned long *cpu_mask = (unsigned long *)&cpumask;
+ unsigned long *cpu_mask = (unsigned long *)&tmp;
for (k = 0; k < BITS_TO_LONGS(NR_CPUS); ++k)
atomic_set_mask(cpu_mask[k], &flush_mask[k]);
}
@@ -385,7 +386,7 @@
* We have to send the IPI only to
* CPUs affected.
*/
- send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
+ send_IPI_mask(tmp, INVALIDATE_TLB_VECTOR);
while (!cpus_empty(flush_cpumask))
/* nothing. lockup detection does not belong here */
On Sat, Feb 28, 2004 at 03:41:33AM -0700, Eric W. Biederman wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> writes:
>
> > On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
> >
> > | > It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
> > | > If the kernel is built for SMP, when running kexec, I get a
> > | > BUG in arch/i386/kernel/smp.c at line 359.
> > | > I'm testing various workarounds for that BUG now.
> > |
> > | I will eyeball it...
> > |
> > | Is it the kernel that is shutting down, or the kernel that is being
> > | brought up that has problems?
> >
> > the kernel that is shutting down.
> >
> > | The back trace from the BUG would be interesting.
> >
> > see below. my bad. i should have included it.
> >
> > | As I see it flush_tlb_others is being called when we have shutdown
> > | cpus and the kernel still thinks we have the mm present on foreign
> > | cpus.
> >
> > Martin Bligh thinks that there is a tlb race here.
> > I printed the 2 cpu masks on my dual-proc macine and saw
> > 0 in one of them and 0xc in the other one.
>
> Ouch we have both cpus running when this happens, and we have not
> started any shutdown whatsoever. This is the bit that sets up
> the page tables for later use...
>
> I think identity_map_pages will have problems with a kernel that does
> the 4G/4G split, and it has known issues on some other architectures,
> because they treat init_mm specially. So the proper solution may be
> to simply rewrite identity_map_pages.
>
> Before we do that in the short term we need to see if
> identity_map_pages is actually doing anything bad. You are
> not using the 4G/4G split so that is not the cause. So either
> init_mm is now special in some way, or we have hit a generic kernel
> bug.
>
> So this may indeed be a tlb race. But it is init_mm->cpu_vm_mask and
> cpu_online map that are different. With the implication being
> that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
> Very weird especially on SMP.
>
> Without attribution I have a hard time making sense of which cpumask
> is which so I can't draw any conclusions. But I find it very
> interesting that it is bits 2 and 3 that are set. I wonder if
> there is any mixup between logical cpu identities and apic ids.
>
> Eric
> _______________________________________________
> fastboot mailing list
> fastboot@lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/fastboot
--
Hariprasad Nellitheertha
Linux Technology Center
India Software Labs
IBM India, Bangalore
next prev parent reply other threads:[~2004-03-04 13:05 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-24 16:03 Latest AIO patchset Suparna Bhattacharya
2004-02-24 11:02 ` kexec "problem" Carlos Silva
2004-02-24 17:10 ` Randy.Dunlap
2004-02-24 17:24 ` Carlos Silva
2004-02-27 0:54 ` kexec "problem" [and patch updates] Randy.Dunlap
2004-02-27 8:00 ` [Fastboot] " Eric W. Biederman
2004-02-27 19:32 ` Randy.Dunlap
2004-02-28 10:41 ` Eric W. Biederman
2004-03-04 13:03 ` Hariprasad Nellitheertha [this message]
2004-03-08 0:32 ` Eric W. Biederman
2004-03-08 18:35 ` Randy.Dunlap
2004-02-25 18:45 ` Latest AIO patchset Hayim Shaul
2004-02-26 0:27 ` Benjamin LaHaise
2004-02-26 13:30 ` Hayim Shaul
2004-02-26 16:45 ` Daniel McNeil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040304130310.GA7741@in.ibm.com \
--to=hari@in.ibm.com \
--cc=ebiederm@xmission.com \
--cc=fastboot@lists.osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=r3pek@r3pek.homelinux.org \
--cc=rddunlap@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox