From: Frank Rowand <frowand.list@gmail.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: devicetree@vger.kernel.org, Rob Herring <robh+dt@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] of: allocate / free phandle cache outside of the devtree_lock
Date: Fri, 29 Nov 2019 20:48:05 -0600 [thread overview]
Message-ID: <1ce03fc5-e527-8bb7-e1af-4d05a9c000e5@gmail.com> (raw)
In-Reply-To: <20191129135756.tfmuygw4be6yftz5@linutronix.de>
Hi Sebastian,
On 11/29/19 7:57 AM, Sebastian Andrzej Siewior wrote:
> On 2019-11-12 16:48:12 [-0600], Frank Rowand wrote:
>> Hi Sebastian,
> Hi Frank,
>
>> On 11/11/19 11:21 AM, Sebastian Andrzej Siewior wrote:
>>> The phandle cache code allocates memory while holding devtree_lock which
>>> is a raw_spinlock_t. Memory allocation (and free()) is not possible on
>>> RT while a raw_spinlock_t is held.
>>> Invoke the kfree() and kcalloc() while the lock is dropped.
>>
>> I thought the GFP_ATOMIC passed to kcalloc() in of_populate_phandle_cache()
>> was sufficient. And I didn't realize (or remember) that kfree was
>> not allowed while a raw_spinlock_t is held. Do you have a
>> pointer to the preempt RT documentation that explains that?
>> I'd like to add that pointer to my personal notes about locking so
>> that I won't mis-remember this too often.
>
> Unfortunately not yet. I have a mini document that needs polishing…
> However, since as far as my RT memory goes:
> GFP_ATOMIC can be used in non-blocking context. On !RT that includes
> disabled preemption and/or interrupts _usually_ part of locking (like
> spin_lock_irqsave()) or the inherited context like IRQ-handler or
> softirq for instance.
> On RT the former example is not blocking anymore (sleeping spinlocks,
> threaded IRQs). The locking with raw_spinlock_t still disables
> preemption and interrupts as part of the procedure. The GFP_ATOMIC in
> this case does not matter on RT here in terms of blocking (the part with
> emergency pools and so on remains unchanged).
> The kfree() part within a raw_spinlock_t section has the same
> limitations on RT as kmalloc() for the same reason. The SLUB part of the
> function is fine. The problem is once SLUB has to call to the page
> allocated in order to return or ask for memory. We can't do this in
> blocking context.
Thanks for writing this up.
And thanks for digging even deeper into the relevant context than I
did below.
-Frank
>
> …
>>> | smp: Bringing up secondary CPUs ...
>>> | BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:973
>>> | in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
>>> | 1 lock held by swapper/1/0:
>>> | #0: c000000000def6e0 (devtree_lock){+.+.}, at: of_find_node_opts_by_path+0x1f4/0x230
>>> | Preemption disabled at:
>>> | [<c0000000000557a0>] start_secondary+0xd0/0x6a0
>>> |
>>> | Call Trace:
>>> | [c0000001f9667d10] [c000000000158e30] ___might_sleep+0x250/0x270
>>> | [c0000001f9667da0] [c000000000984f40] rt_spin_lock+0x70/0x90
>>> | [c0000001f9667de0] [c0000000007e3634] of_find_node_opts_by_path+0x1f4/0x230
>>> | [c0000001f9667e30] [c0000000007e3844] of_get_next_cpu_node+0x144/0x180
>>> | [c0000001f9667e70] [c0000000007e38d8] of_get_cpu_node+0x58/0x90
>>> | [c0000001f9667eb0] [c00000000002eb00] cpu_to_chip_id+0x20/0x70
>>> | [c0000001f9667ee0] [c000000000055858] start_secondary+0x188/0x6a0
>>> | [c0000001f9667f90] [c00000000000b554] start_secondary_prolog+0x10/0x14
>>>
>>> because cpu_to_chip_id() acquires devtree_lock() early in the CPU-bring
>>> up path.
>>
>> I read too much into that sentence, and ran off on a tangent re-educating
>> myself on preempt RT lock stuff.
>>
>> The issue in this path is that start_secondary() disables preemption before
>> going down the code path that ends up with an attempt by of_find_node_opts_by_path()
>> to lock devtree_lock. It is ok to acquire a raw spinlock with preemption
>> disabled, but not ok to acquire a normal spinlock with preemption disabled.
> That is correct. So
> spin_lock();
> spin_lock();
>
> is okay. However
> preempt_disable();
> spin_lock();
>
> is not okay. Same is true for s/preempt_disable/local_irq_disable()/.
>
> Please note the even before preempt_disable() in start_secondary() the
> whole code runs with disabled interrupts as part of CPU bring up.
>
>> The calling path to cpu_to_chip_id() has an intervening call that does not
>> show up in the above trace, add_cpu_to_masks(). The first call of cpu_to_chip_id()
>> is "int chipid = cpu_to_chip_id(cpu)", which could be moved out to start_secondary(),
>> before preemption is disabled. But at the end of add_cpu_to_masks() is:
>>
>> for_each_cpu(i, cpu_online_mask)
>> if (cpu_to_chip_id(i) == chipid)
>> set_cpus_related(cpu, i, cpu_core_mask);
>>
>> This use of cpu_to_chip_id() is a little harder to move to before the preemption,
>> but it is possible. A table of the chip ids for all possible cpus could be
>> created before disabling preemption, and the table could be passed into
>> add_cpu_to_masks(). This would allow devtree_lock to be changed to a
>> spinlock_t.
>>
>> I like this approach because it removes the one known place that constrains
>> what type of lock devtree_lock is.
>
> So even if we move that whole section before preempt_disable(), we still
> run with disabled interrupts and can't use spinlock_t.
>
>> My second choice (and I am willing to accept this) is:
>>
>>>
>>> drivers/of/base.c | 19 +++++++++++++------
>>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -138,31 +138,34 @@ static u32 phandle_cache_mask;
>>> /*
>>> * Caller must hold devtree_lock.
>>> */
>>
>> Add a one line comment to the effect of kfree()
>> can not occur while raw_spinlock_t held, so caller must
>> do the kfree().
>
> okay.
>
> …
>>> @@ -208,12 +212,14 @@ void of_populate_phandle_cache(void)
>>>
>>> if (!phandles)
>>> goto out;
>>
>> Add a one line comment to the effect of raw_spinlock_t can not be held
>> when calling kcalloc().
> okay.
>
>>> + raw_spin_unlock_irqrestore(&devtree_lock, flags);
>>>
>>> cache_entries = roundup_pow_of_two(phandles);
>>> phandle_cache_mask = cache_entries - 1;
>>>
>>
>> Need to avoid race with of_find_node_by_phandle(). So change the following
>> to tmp_phandle_cache = kcalloc(...
> okay.
>
>>> phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
>>> GFP_ATOMIC);
>>> + raw_spin_lock_irqsave(&devtree_lock, flags);
>>
>> Then here:
>>
>> phandle_cache = tmp_phandle_cache;
>>
>>> if (!phandle_cache)
>>> goto out;
>>>
>>> @@ -225,6 +231,7 @@ void of_populate_phandle_cache(void)
>>>
>>> out:
>>> raw_spin_unlock_irqrestore(&devtree_lock, flags);
>>> + kfree(shadow);
>>> }
>>>
>>> void __init of_core_init(void)
>>>
>>
>> The subtle race with of_find_node_by_phandle() is that if
>> of_find_node_by_phandle() added an entry to the cache it
>> also did an of_node_get(). It is ok that of_populate_phandle_cache()
>> overwrite the cache entry, but it would also do an additional
>> of_node_get().
> okay.
>
>> -Frank
>
> Sebastian
>
next prev parent reply other threads:[~2019-11-30 2:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-11 17:21 [PATCH] of: allocate / free phandle cache outside of the devtree_lock Sebastian Andrzej Siewior
2019-11-12 3:35 ` Rob Herring
2019-11-12 9:10 ` Sebastian Andrzej Siewior
2019-11-12 15:55 ` Rob Herring
2019-11-12 23:46 ` Frank Rowand
2019-11-13 0:48 ` Rob Herring
2019-11-13 16:52 ` Frank Rowand
2019-11-12 22:48 ` Frank Rowand
2019-11-29 13:57 ` Sebastian Andrzej Siewior
2019-11-30 2:48 ` Frank Rowand [this message]
2019-11-29 14:04 ` [PATCH v2] " Sebastian Andrzej Siewior
2019-11-30 2:46 ` Frank Rowand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ce03fc5-e527-8bb7-e1af-4d05a9c000e5@gmail.com \
--to=frowand.list@gmail.com \
--cc=bigeasy@linutronix.de \
--cc=devicetree@vger.kernel.org \
--cc=robh+dt@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).