From: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Black <daniel.black@au1.ibm.com>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] powerpc: Drop lockdep_assert_cpus_held call from arch_update_cpu_topology
Date: Thu, 05 Oct 2017 15:26:01 -0300 [thread overview]
Message-ID: <87o9pl5u4m.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1710051435530.2083@nanos>
Hello Thomas,
Thanks for your comments.
Thomas Gleixner <tglx@linutronix.de> writes:
> On Wed, 4 Oct 2017, Thiago Jung Bauermann wrote:
>
>> It turns out that not all paths calling arch_update_cpu_topology hold
>> cpu_hotplug_lock, but that's ok because those paths aren't supposed to race
>> with any concurrent hotplug events.
>>
>> Callers of arch_update_cpu_topology are expected to know what they are
>> doing when they call the function without holding the lock, so remove the
>> lockdep warning.
>
> "Expected to know what they are doing" is not really a good approach as
> it's way too simple to screw things up.
I agree.
> You might consider to have two functions where one does the check and the
> other does not, but I leave that to the PPC maintainers.
It doesn't look like powerpc uses arch_update_cpu_topology differently
than other arches. Are you saying that all callers of the function
should be holding cpu_hotplug_lock? I came to the conclusion that this
isn't the case because of two things:
1. This comment in sched_init_smp:
/*
* There's no userspace yet to cause hotplug operations; hence all the
* CPU masks are stable and all blatant races in the below code cannot
* happen.
*/
mutex_lock(&sched_domains_mutex);
sched_init_domains(cpu_active_mask);
cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
if (cpumask_empty(non_isolated_cpus))
cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
mutex_unlock(&sched_domains_mutex);
This is relevant for the following call trace:
[c000003ff6107b50] [c00000000010ac70] lockdep_assert_cpus_held+0x50/0x70 (unreliable)
[c000003ff6107b70] [c0000000000802c0] arch_update_cpu_topology+0x20/0x40
[c000003ff6107b90] [c000000000182ec4] sched_init_domains+0x74/0x100
[c000003ff6107bd0] [c000000000ed5c78] sched_init_smp+0x58/0x168
[c000003ff6107d00] [c000000000eb460c] kernel_init_freeable+0x1fc/0x3ac
[c000003ff6107dc0] [c00000000000dc2c] kernel_init+0x2c/0x150
[c000003ff6107e30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
2. Your comment on patch "lockup_detector: Cleanup hotplug locking mess"¹:
"All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly."
Though TBH I'm still grasping the smpboot thread infrastructure and
I'm not sure how it handles serialization against CPU hotplug.
This is relevant for the following call trace:
[c000001ff257ba00] [c0000000000f8618] lockdep_assert_cpus_held+0x48/0x80 (unreliable)
[c000001ff257ba20] [c00000000007a848] arch_update_cpu_topology+0x18/0x30
[c000001ff257ba40] [c00000000016bd0c] partition_sched_domains+0x8c/0x4e0
[c000001ff257baf0] [c00000000020a914] cpuset_update_active_cpus+0x24/0x60
[c000001ff257bb10] [c0000000001449bc] sched_cpu_deactivate+0xfc/0x1a0
[c000001ff257bc20] [c0000000000f5a8c] cpuhp_invoke_callback+0x19c/0xe00
[c000001ff257bcb0] [c0000000000f6868] cpuhp_down_callbacks+0x78/0xf0
[c000001ff257bd00] [c0000000000f6c1c] cpuhp_thread_fun+0x1fc/0x210
[c000001ff257bd50] [c000000000132f4c] smpboot_thread_fn+0x2fc/0x370
[c000001ff257bdc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001ff257be30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
>From what you said though it looks like the lockdep assertion shouldn't
have been triggered in either case. I'll keep digging.
--
Thiago Jung Bauermann
IBM Linux Technology Center
¹ https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-September/163477.html
next prev parent reply other threads:[~2017-10-05 18:26 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-05 0:04 [PATCH] powerpc: Drop lockdep_assert_cpus_held call from arch_update_cpu_topology Thiago Jung Bauermann
2017-10-05 12:38 ` Thomas Gleixner
2017-10-05 18:26 ` Thiago Jung Bauermann [this message]
2017-10-05 18:46 ` Thomas Gleixner
2017-10-10 9:48 ` Michael Ellerman
2017-10-12 0:20 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o9pl5u4m.fsf@linux.vnet.ibm.com \
--to=bauerman@linux.vnet.ibm.com \
--cc=daniel.black@au1.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).