From: Matthias Brugger <mbrugger@suse.com>
To: Marc Zyngier <marc.zyngier@arm.com>,
"takahiro.akashi@linaro.org" <takahiro.akashi@linaro.org>,
James Morse <james.morse@arm.com>,
Stefan Wahren <stefan.wahren@i2se.com>,
Petr Tesarik <ptesarik@suse.cz>,
kexec mailing list <kexec@lists.infradead.org>,
linux-arm-kernel@lists.infradead.org
Subject: Re: panic kexec broken on ARM64?
Date: Wed, 4 Jul 2018 16:08:38 +0200 [thread overview]
Message-ID: <a2796dc8-2f0f-468f-0c9f-ec296f3b302a@suse.com> (raw)
In-Reply-To: <86601wzn3h.wl-marc.zyngier@arm.com>
On 03/07/18 10:58, Marc Zyngier wrote:
> On 03/07/18 08:01, takahiro.akashi@linaro.org wrote:
>> Marc, James,
>>
>> I'd like to re-ignite the discussion.
>>
>> On Sun, Jun 10, 2018 at 01:24:17PM +0100, Marc Zyngier wrote:
>>> On Wed, 06 Jun 2018 12:37:02 +0100,
>>> James Morse wrote:
>>>>
>>>> Hi Stefan,
>>>>
>>>> On 06/06/18 08:02, Stefan Wahren wrote:
>>>>> Am 05.06.2018 um 19:46 schrieb James Morse:
>>>>>> On 05/06/18 09:01, Petr Tesarik wrote:
>>>>>>> I attached a hardware debugger and found
>>>>>>> out that all CPU cores were stopped except one which was stuck in the
>>>>>>> idle thread. It seems that irq_set_irqchip_state() may sleep, which is
>>>>>>> definitely not safe after a kernel panic.
>>>>
>>>>>> I don't know much about irqchip stuff, but __irq_get_desc_lock() takes a
>>>>>> raw_spin_lock(), and calls gic_irq_get_irqchip_state() which is just poking
>>>>>> around in mmio registers, this should all be safe unless you re-entered the same
>>>>>> code.
>>>>
>>>>>>> If I'm right, then this is broken in general, but I have only ever seen
>>>>>>> it on RPi 3 Model B+ (even RPi3 Model B works fine), so the issue may
>>>>>>> be more subtle.
>>>>
>>>>>> Is there a hardware difference around the interrupt controller on these?
>>>>
>>>>> No, but the RPi 3 B has a different USB network chip on board (smsc95xx, Fast
>>>>> ethernet) instead of lan78xx (Gigabit ethernet).
>>>>
>>>> Bingo: its the lan78xx driver that is sleeping from the irqchip
>>>> callbacks; The smsc95xx driver doesn't have a struct irq_chip, which
>>>> is why the RPi-3-B doesn't do this.
>>>>
>>>> It may be valid for kdump to only teardown the 'root irqdomain' (if
>>>> that even means anything). I assume these secondary irqchip's would
>>>> have a summary-interrupt that goes to another irqchip. But I can't
>>>> see a way to tell them apart..,
>>>
>>> There is none. A cascaded irqchip is just like a root irqchip, just
>>> that its output line is connected to another irqchip. But we have no
>>> easy way to identify the parent. Also, this particular driver looks
>>> quite creative (it reinvents the wheel for chained interrupts -- see
>>> intr_complete and lan78xx_status), meaning that even if we could have
>>> a magic way of identify a chained irqchip, we'd miss that one. Broken.
>>>
>>>> I think we need to wait until after the merge window for Marc's
>>>> wisdom on this!
>>>
>>> Overall, I can't think of an easy fix. We have a few options, but none
>>> of them involve a centralised change:
>>>
>>> 1) We provide a reset infrastructure for irqchips, with an opt-in
>>> mechanism. This involves changing the way we teardown irqs at
>>> crash-time, and we'd then need some notion of reset ordering (think
>>> of the layered ITS and GICv3, for example).
>>
>> Does this mean that all the irqchips have to be implemented with reset?
>
> No. Only those that want to be reset at kexec time.
>
>>>
>>> 2) We provide a way to identify interrupts that are ultimately backed
>>> by a root controller, which implies walking down the hierarchy for
>>
>> To be clear, from bottom to top (or root), right?
>
> I'm not sure I understand your question. The idea is to walk the
> irq_data chain, until we hit a root irqchip. If we do hit one, we
> deactivate/eoi/disable this interrupt. If we don't, we do nothing.
>
> This would avoid the above brokenness, and still ensures that no
> interrupt reaches the CPU.
>
>>
>>> each one of them. Fairly expensive, but minimal in way of changes
>>> in the crash code. Requires a per-irqchip flag, but ordering comes
>>> in for free.
>>>
>>> 3) We do the same as (2), but at the irqdomain level. Not sure that's
>>> any better, and it may be even more complicated and bring back some
>>> ordering issues.
>>
>> Do you think that the same thing may happen in case of pci/msi?
>> I have no confidence but MSI has some kind of irq domain hierarchy.
>
> Anything can happen, as people implement their interrupt infrastructure
> in weird and wonderful ways. So we need to be prepared for the worse.
>
> I've pushed 3 patches on a branch[1]. It is mostly untested, but it
> should allow the above RPi3 disaster to cope with kexec.
>
> M.
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/root-irqchip
>
I threw the kernel on my RPi3+ model but I wasn't able to start the crash
kernel. Unfortunately I don't have a JTAG adapter to check if it hangs for the
same reason.
Regards,
Matthias
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2018-07-04 14:08 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-05 8:01 panic kexec broken on ARM64? Petr Tesarik
2018-06-05 17:46 ` James Morse
2018-06-06 7:02 ` Stefan Wahren
2018-06-06 8:00 ` Petr Tesarik
2018-06-06 11:41 ` Petr Tesarik
2018-06-06 11:37 ` James Morse
2018-06-10 12:24 ` Marc Zyngier
2018-07-03 7:01 ` takahiro.akashi
2018-07-03 8:58 ` Marc Zyngier
2018-07-04 8:41 ` takahiro.akashi
2018-07-04 9:02 ` Marc Zyngier
2018-07-05 10:13 ` takahiro.akashi
2018-07-05 10:19 ` Marc Zyngier
2018-08-02 15:49 ` David Woodhouse
2018-08-03 6:06 ` Marc Zyngier
2018-07-04 12:47 ` James Morse
2018-07-05 10:18 ` takahiro.akashi
2018-07-04 14:08 ` Matthias Brugger [this message]
2018-07-04 14:20 ` Marc Zyngier
2018-06-06 5:36 ` Bhupesh Sharma
2018-06-06 7:58 ` Petr Tesarik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a2796dc8-2f0f-468f-0c9f-ec296f3b302a@suse.com \
--to=mbrugger@suse.com \
--cc=james.morse@arm.com \
--cc=kexec@lists.infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=marc.zyngier@arm.com \
--cc=ptesarik@suse.cz \
--cc=stefan.wahren@i2se.com \
--cc=takahiro.akashi@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox