Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: "takahiro.akashi@linaro.org" <takahiro.akashi@linaro.org>,
	James Morse <james.morse@arm.com>,
	Stefan Wahren <stefan.wahren@i2se.com>,
	Petr Tesarik <ptesarik@suse.cz>,
	Matthias Brugger <mbrugger@suse.com>,
	kexec mailing list <kexec@lists.infradead.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: panic kexec broken on ARM64?
Date: Thu, 5 Jul 2018 11:19:43 +0100	[thread overview]
Message-ID: <a95ec854-ff32-c808-befe-b2db112f6c1e@arm.com> (raw)
In-Reply-To: <20180705101326.GM28220@linaro.org>

On 05/07/18 11:13, takahiro.akashi@linaro.org wrote:
> On Wed, Jul 04, 2018 at 10:02:24AM +0100, Marc Zyngier wrote:
>> On 04/07/18 09:41, takahiro.akashi@linaro.org wrote:
>>> On Tue, Jul 03, 2018 at 09:58:44AM +0100, Marc Zyngier wrote:
>>>> On 03/07/18 08:01, takahiro.akashi@linaro.org wrote:
>>>>> Marc, James,
>>>>>
>>>>> I'd like to re-ignite the discussion.
>>>>>
>>>>> On Sun, Jun 10, 2018 at 01:24:17PM +0100, Marc Zyngier wrote:
>>>>>> On Wed, 06 Jun 2018 12:37:02 +0100,
>>>>>> James Morse wrote:
>>>>>>>
>>>>>>> Hi Stefan,
>>>>>>>
>>>>>>> On 06/06/18 08:02, Stefan Wahren wrote:
>>>>>>>> Am 05.06.2018 um 19:46 schrieb James Morse:
>>>>>>>>> On 05/06/18 09:01, Petr Tesarik wrote:
>>>>>>>>>> I attached a hardware debugger and found
>>>>>>>>>> out that all CPU cores were stopped except one which was stuck in the
>>>>>>>>>> idle thread. It seems that irq_set_irqchip_state() may sleep, which is
>>>>>>>>>> definitely not safe after a kernel panic.
>>>>>>>
>>>>>>>>> I don't know much about irqchip stuff, but __irq_get_desc_lock() takes a
>>>>>>>>> raw_spin_lock(), and calls gic_irq_get_irqchip_state() which is just poking
>>>>>>>>> around in mmio registers, this should all be safe unless you re-entered the same
>>>>>>>>> code.
>>>>>>>
>>>>>>>>>> If I'm right, then this is broken in general, but I have only ever seen
>>>>>>>>>> it on RPi 3 Model B+ (even RPi3 Model B works fine), so the issue may
>>>>>>>>>> be more subtle.
>>>>>>>
>>>>>>>>> Is there a hardware difference around the interrupt controller on these?
>>>>>>>
>>>>>>>> No, but the RPi 3 B has a different USB network chip on board (smsc95xx, Fast
>>>>>>>> ethernet) instead of lan78xx (Gigabit ethernet).
>>>>>>>
>>>>>>> Bingo: its the lan78xx driver that is sleeping from the irqchip
>>>>>>> callbacks; The smsc95xx driver doesn't have a struct irq_chip, which
>>>>>>> is why the RPi-3-B doesn't do this.
>>>>>>>
>>>>>>> It may be valid for kdump to only teardown the 'root irqdomain' (if
>>>>>>> that even means anything). I assume these secondary irqchip's would
>>>>>>> have a summary-interrupt that goes to another irqchip. But I can't
>>>>>>> see a way to tell them apart..,
>>>>>>
>>>>>> There is none. A cascaded irqchip is just like a root irqchip, just
>>>>>> that its output line is connected to another irqchip. But we have no
>>>>>> easy way to identify the parent. Also, this particular driver looks
>>>>>> quite creative (it reinvents the wheel for chained interrupts -- see
>>>>>> intr_complete and lan78xx_status), meaning that even if we could have
>>>>>> a magic way of identify a chained irqchip, we'd miss that one. Broken.
>>>>>>
>>>>>>> I think we need to wait until after the merge window for Marc's
>>>>>>> wisdom on this!
>>>>>>
>>>>>> Overall, I can't think of an easy fix. We have a few options, but none
>>>>>> of them involve a centralised change:
>>>>>>
>>>>>> 1) We provide a reset infrastructure for irqchips, with an opt-in
>>>>>>    mechanism. This involves changing the way we teardown irqs at
>>>>>>    crash-time, and we'd then need some notion of reset ordering (think
>>>>>>    of the layered ITS and GICv3, for example).
>>>>>
>>>>> Does this mean that all the irqchips have to be implemented with reset?
>>>>
>>>> No. Only those that want to be reset at kexec time.
>>>
>>> I don't get the point yet. Who should have reset interface?
>>> What is the criteria?
>>
>> The criteria is "this irqchip requires a reset to be safely used in the
>> secondary kernel". This is a judgement call from the person writing the
>> driver.
> 
> This doesn't tell me anything more than "do it if you need it."
> So let me ask you in other words.
> Does gic driver need to provide a reset function?
> Whether yes or no, why do you think so?

Because I know the architecture and I can assess that it needs it. Case
in point: The RDs have memory tables. kexec without disabling LPIs, and
you end-up with memory corruption.

Sorry, but there is no magic bullet. You have to understand what you're
doing.

> 
>>
>>>
>>>>>>
>>>>>> 2) We provide a way to identify interrupts that are ultimately backed
>>>>>>    by a root controller, which implies walking down the hierarchy for
>>>>>
>>>>> To be clear, from bottom to top (or root), right?
>>>>
>>>> I'm not sure I understand your question. The idea is to walk the
>>>> irq_data chain, until we hit a root irqchip. If we do hit one, we
>>>> deactivate/eoi/disable this interrupt. If we don't, we do nothing.
>>>
>>> I thought that we would traverse the (chained irq) hierarchy from
>>> bottom to top and call deactivate or others in that order.
>>> Am I wrong here?
>>
>> You *cannot* traverse a hierarchy through a chained irq. At that stage,
>> irq_data->parent_data is NULL.
> 
> What do you mean by "at that stage?"

At the point you reach a secondary interrupt controller that exposese a
chained IRQ.

> In your patch, "genirq: Add predicate for irq served by root irqchip,"
> irq_irqchip_is_root() dereferences its parent_data.

And? Hierarchies and chained controllers are two orthogonal concepts.
You can have a hierarchy on top of a chained controller, and that
doesn't make it a root controller.

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2018-07-05 10:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-05  8:01 panic kexec broken on ARM64? Petr Tesarik
2018-06-05 17:46 ` James Morse
2018-06-06  7:02   ` Stefan Wahren
2018-06-06  8:00     ` Petr Tesarik
2018-06-06 11:41       ` Petr Tesarik
2018-06-06 11:37     ` James Morse
2018-06-10 12:24       ` Marc Zyngier
2018-07-03  7:01         ` takahiro.akashi
2018-07-03  8:58           ` Marc Zyngier
2018-07-04  8:41             ` takahiro.akashi
2018-07-04  9:02               ` Marc Zyngier
2018-07-05 10:13                 ` takahiro.akashi
2018-07-05 10:19                   ` Marc Zyngier [this message]
2018-08-02 15:49                     ` David Woodhouse
2018-08-03  6:06                       ` Marc Zyngier
2018-07-04 12:47               ` James Morse
2018-07-05 10:18                 ` takahiro.akashi
2018-07-04 14:08             ` Matthias Brugger
2018-07-04 14:20               ` Marc Zyngier
2018-06-06  5:36 ` Bhupesh Sharma
2018-06-06  7:58   ` Petr Tesarik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a95ec854-ff32-c808-befe-b2db112f6c1e@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=james.morse@arm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mbrugger@suse.com \
    --cc=ptesarik@suse.cz \
    --cc=stefan.wahren@i2se.com \
    --cc=takahiro.akashi@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox