From: Hariprasad Nellitheertha <hari@in.ibm.com>
To: "Martin J. Bligh" <mbligh@aracnet.com>
Cc: Andrew Morton <akpm@osdl.org>,
rddunlap@osdl.org, linux-kernel@vger.kernel.org,
apw@shadowen.org, jamesclv@us.ibm.com
Subject: Re: BUG_ON(!cpus_equal(cpumask, tmp));
Date: Wed, 31 Mar 2004 10:13:04 +0530 [thread overview]
Message-ID: <20040331044304.GA5167@in.ibm.com> (raw)
In-Reply-To: <276260000.1080697873@flay>
There are actually two different problems here, and the BUG_ON is
hit in both cases.
1) In INIT_MM(), we now do this
.cpu_vm_mask = CPU_MASK_ALL,
With this, if we enter flush_tlb_others with init_mm, all bits except the one
corresponding to the current cpu are set. For example, on my UNI machine
running an SMP kernel with NR_CPUS=8, cpumask is 0xfe. The BUG_ON is hit
even if we are doing nothing related to shutdown or cpu-offlining (like
when we are just loading a new kernel into memory using kexec).
2) The issue of a race between taking CPUs offline and the tlbflush code. The
discussions have been centered around this issue.
The patch I sent across, though, is completely targetted towards issue 1.
Regards, Hari
On Tue, Mar 30, 2004 at 05:51:13PM -0800, Martin J. Bligh wrote:
> >> I think we're assuming that we don't have to because the problem is fixed
> >> by the "cpus_and(tmp, cpumask, cpu_online_map)" in flush_tlb_others so we
> >> don't have to. Except it's racy, and doesn't work.
> >
> > And it's a kludge, to work around dangling references to a CPU which has
> > gone away.
>
> Yes ;-)
>
> >> It would seem to me that your suggestion would fix it. But isn't locking
> >> cpu_online_map both simpler and (most importantly) more generic? I can't
> >> imagine that we don't use this elsewhere ... suppose for instance we took
> >> a timer interrupt, causing a scheduler rebalance, and moved a process to
> >> an offline CPU at that point? Isn't any user of smp_call_function also racy?
> >
> > If we have to add any fastpath locking to cope with CPU removal or reboot
> > then it's time to make CONFIG_HOTPLUG_CPU dependent upon CONFIG_BROKEN.
>
> Yeah, but as we've proved, it's not just hotplug, it's shutdown. And I don't
> think we can make that depend on CONFIG_BROKEN ;-) I don't see a *read*
> side RCU lock as an impostion on the fastpath (for reading cpu_online_map),
> and I don't care if writing to cpu_online_map is slower. A spinlock would
> be crappy, yes ...
>
> > yes, cpu_online_map should be viewed as a reference to the going-away CPU
> > for smp_call_function purposes. However the CPU takedown code appears to
> > do the right thing: it removes the cpu from cpu_online_map first, then does
> > the stop_machine() thing which should ensure that all other CPUs have
> > completed any cross-CPU call which they were doing, yes?
>
> Andy almost managed to convince me that the smp_call_function stuff is safe,
> based on call_lock exclusion. Except that we count that cpu stuff outside
> it ... but that's trivial to fix, we just move it inside the lock (patch
> below - untested, but trivial).
>
> He also pointed out that we could fairly easily fix the tlb stuff by
> taking the tlb lock before taking a cpu offline. Which still doesn't
> make me desperately comfortable ... but then he's smarter than me ;-)
> To me it comes down to ... do we want to lock the damned thing, or fix
> all the callers to be really, really careful?
>
> diff -purN -X /home/mbligh/.diff.exclude virgin/arch/i386/kernel/smp.c smp_call_function/arch/i386/kernel/smp.c
> --- virgin/arch/i386/kernel/smp.c 2004-03-11 14:33:36.000000000 -0800
> +++ smp_call_function/arch/i386/kernel/smp.c 2004-03-30 17:43:34.000000000 -0800
> @@ -514,10 +514,7 @@ int smp_call_function (void (*func) (voi
> */
> {
> struct call_data_struct data;
> - int cpus = num_online_cpus()-1;
> -
> - if (!cpus)
> - return 0;
> + int cpus;
>
> data.func = func;
> data.info = info;
> @@ -527,6 +524,10 @@ int smp_call_function (void (*func) (voi
> atomic_set(&data.finished, 0);
>
> spin_lock(&call_lock);
> + cpus = num_online_cpus()-1;
> + if (!cpus)
> + return 0;
> +
> call_data = &data;
> mb();
>
>
--
Hariprasad Nellitheertha
Linux Technology Center
India Software Labs
IBM India, Bangalore
next prev parent reply other threads:[~2004-03-31 4:43 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-29 15:39 BUG_ON(!cpus_equal(cpumask, tmp)); Martin J. Bligh
2004-03-30 0:21 ` Andrew Morton
2004-03-30 0:25 ` Andrew Morton
2004-03-30 13:28 ` Hariprasad Nellitheertha
2004-03-30 23:17 ` Randy.Dunlap
2004-03-31 0:22 ` Martin J. Bligh
2004-03-31 0:39 ` Andrew Morton
2004-03-31 0:57 ` Martin J. Bligh
2004-03-31 1:11 ` Andrew Morton
2004-03-31 1:24 ` Martin J. Bligh
2004-03-31 1:36 ` Andrew Morton
2004-03-31 1:51 ` Martin J. Bligh
2004-03-31 4:43 ` Hariprasad Nellitheertha [this message]
2004-04-01 0:31 ` Andy Whitcroft
2004-04-01 5:04 ` Srivatsa Vaddagiri
2004-04-01 11:38 ` Andy Whitcroft
2004-04-02 18:33 ` Randy.Dunlap
2004-04-01 8:42 ` Paul Jackson
2004-04-01 13:57 ` Hariprasad Nellitheertha
2004-04-03 1:45 ` Andy Whitcroft
2004-03-31 1:01 ` Andy Whitcroft
-- strict thread matches above, loose matches on Subject: below --
2004-01-02 23:51 Martin J. Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040331044304.GA5167@in.ibm.com \
--to=hari@in.ibm.com \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=jamesclv@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@aracnet.com \
--cc=rddunlap@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.