All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: dann frazier <dann.frazier@canonical.com>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Sumit Garg <sumit.garg@linaro.org>,
	kernel-team@android.com, Russell King <linux@arm.linux.org.uk>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Fu Wei <fu.wei@linaro.org>
Subject: Re: [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts
Date: Wed, 21 Apr 2021 16:49:03 +0100	[thread overview]
Message-ID: <87wnsvprio.wl-maz@kernel.org> (raw)
In-Reply-To: <YIA8RCjoI+9nChN6@xps13.dannf>

On Wed, 21 Apr 2021 15:52:52 +0100,
dann frazier <dann.frazier@canonical.com> wrote:
> 
> [ + Fu Wei ]

[...]

> >
> > Please feed this stacktrace to scripts/decode_stacktrace.sh so that I
> > can get an idea about what is going wrong. I bet something is playing
> > ungodly games with the one of the IPIs, and things go horribly wrong.
> 
> hey Marc,
>   Sure:
> 
> [    7.927289] Unable to handle kernel read from unreadable memory at virtual address 0000000000000028
> [    7.936326] Mem abort info:
> [    7.939108]   ESR = 0x96000004
> [    7.942151]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    7.947451]   SET = 0, FnV = 0
> [    7.950494]   EA = 0, S1PTW = 0
> [    7.953624] Data abort info:
> [    7.956492]   ISV = 0, ISS = 0x00000004
> [    7.960316]   CM = 0, WnR = 0
> [    7.963273] [0000000000000028] user address but active_mm is swapper
> [    7.969616] Internal error: Oops: 96000004 [#1] SMP
> [    7.974483] Modules linked in:
> [    7.977531] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc8 #19
> [    7.983874] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS F02 08/06/2019
> [    7.990737] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> [    7.996732] pc : __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283) 
> [    8.000910] lr : smp_cross_call (/home/ubuntu/linux/arch/arm64/kernel/smp.c:958) 
> [    8.004913] sp : ffff800012753c10
> [    8.008216] x29: ffff800012753c10 x28: ffff000100de5d00
> [    8.013521] x27: 000000000000000a x26: ffff80001225da20
> [    8.018825] x25: 0000000000000000 x24: ffff000ff62719b0
> [    8.024129] x23: ffff80001225d000 x22: ffff800012368108
> [    8.029433] x21: ffff800010f69a20 x20: 0000000000000000
> [    8.034737] x19: ffff000100143c60 x18: 0000000000000020
> [    8.040041] x17: 000000008e74252f x16: 00000000bf0ab2ad
> [    8.045345] x15: ffffffffffffffff x14: 0000000000000000
> [    8.050649] x13: 003d090000000000 x12: 00003d0900000000
> [    8.055953] x11: 0000000000000000 x10: 00003d0900000000
> [    8.061257] x9 : ffff800010027f14 x8 : 0000000000000000
> [    8.066561] x7 : 00000000ffffffff x6 : ffff000ff6148698
> [    8.071865] x5 : ffff80001159d040 x4 : ffff80001159d110
> [    8.077169] x3 : ffff800010f69a00 x2 : 0000000000000000
> [    8.082473] x1 : ffff800010f69a20 x0 : 0000000000000000
> [    8.087777] Call trace:
> [    8.090213] __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283) 

Thanks for that. This resolves to:

	if (irq_domain_is_ipi_per_cpu(data->domain)) {

data->domain is NULL, and we probably are using freed memory...

> > Now, here's a hunch: in the fine TX1 tradition, the firmware is broken
> > and the GTDT table looks unusable. Amusingly, the crash happens right
> > after the SBSA watchdog fails to probe.
> 
> Yeah, I noticed that, but didn't highlight it as I didn't see it in
> the backtrace...
> 
> > And looking at the code that implements that driver, it looks dodgy as
> > hell, as it unmaps an interrupt it doesn't even know is valid. And it
> > does that right when the driver fails the way you experienced it. If,
> > by any chance, the interrupt field is 0 in the firmware table, this
> > results in SGI0 being unmapped. Given that this is the rescheduling
> > interrupt, fireworks happen.
> 
> ... and that explains why. I wouldn't have gotten there, but wish I'd
> thought to test w/ the watchdog compiled out :(

No worries. This IRQ series has uncovered a number of terrible driver
behaviours since I merged it, and these bugs are worth every penny.

> > Can you have a go with the patchlet below, and let me know if that
> > helps?
> 
> It does!

Awesome. I'll Cc you on the actual patch, feel free to respond with a
Tested-by: if you want.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: dann frazier <dann.frazier@canonical.com>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Sumit Garg <sumit.garg@linaro.org>,
	kernel-team@android.com, Russell King <linux@arm.linux.org.uk>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Fu Wei <fu.wei@linaro.org>
Subject: Re: [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts
Date: Wed, 21 Apr 2021 16:49:03 +0100	[thread overview]
Message-ID: <87wnsvprio.wl-maz@kernel.org> (raw)
In-Reply-To: <YIA8RCjoI+9nChN6@xps13.dannf>

On Wed, 21 Apr 2021 15:52:52 +0100,
dann frazier <dann.frazier@canonical.com> wrote:
> 
> [ + Fu Wei ]

[...]

> >
> > Please feed this stacktrace to scripts/decode_stacktrace.sh so that I
> > can get an idea about what is going wrong. I bet something is playing
> > ungodly games with the one of the IPIs, and things go horribly wrong.
> 
> hey Marc,
>   Sure:
> 
> [    7.927289] Unable to handle kernel read from unreadable memory at virtual address 0000000000000028
> [    7.936326] Mem abort info:
> [    7.939108]   ESR = 0x96000004
> [    7.942151]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    7.947451]   SET = 0, FnV = 0
> [    7.950494]   EA = 0, S1PTW = 0
> [    7.953624] Data abort info:
> [    7.956492]   ISV = 0, ISS = 0x00000004
> [    7.960316]   CM = 0, WnR = 0
> [    7.963273] [0000000000000028] user address but active_mm is swapper
> [    7.969616] Internal error: Oops: 96000004 [#1] SMP
> [    7.974483] Modules linked in:
> [    7.977531] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc8 #19
> [    7.983874] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS F02 08/06/2019
> [    7.990737] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> [    7.996732] pc : __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283) 
> [    8.000910] lr : smp_cross_call (/home/ubuntu/linux/arch/arm64/kernel/smp.c:958) 
> [    8.004913] sp : ffff800012753c10
> [    8.008216] x29: ffff800012753c10 x28: ffff000100de5d00
> [    8.013521] x27: 000000000000000a x26: ffff80001225da20
> [    8.018825] x25: 0000000000000000 x24: ffff000ff62719b0
> [    8.024129] x23: ffff80001225d000 x22: ffff800012368108
> [    8.029433] x21: ffff800010f69a20 x20: 0000000000000000
> [    8.034737] x19: ffff000100143c60 x18: 0000000000000020
> [    8.040041] x17: 000000008e74252f x16: 00000000bf0ab2ad
> [    8.045345] x15: ffffffffffffffff x14: 0000000000000000
> [    8.050649] x13: 003d090000000000 x12: 00003d0900000000
> [    8.055953] x11: 0000000000000000 x10: 00003d0900000000
> [    8.061257] x9 : ffff800010027f14 x8 : 0000000000000000
> [    8.066561] x7 : 00000000ffffffff x6 : ffff000ff6148698
> [    8.071865] x5 : ffff80001159d040 x4 : ffff80001159d110
> [    8.077169] x3 : ffff800010f69a00 x2 : 0000000000000000
> [    8.082473] x1 : ffff800010f69a20 x0 : 0000000000000000
> [    8.087777] Call trace:
> [    8.090213] __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283) 

Thanks for that. This resolves to:

	if (irq_domain_is_ipi_per_cpu(data->domain)) {

data->domain is NULL, and we probably are using freed memory...

> > Now, here's a hunch: in the fine TX1 tradition, the firmware is broken
> > and the GTDT table looks unusable. Amusingly, the crash happens right
> > after the SBSA watchdog fails to probe.
> 
> Yeah, I noticed that, but didn't highlight it as I didn't see it in
> the backtrace...
> 
> > And looking at the code that implements that driver, it looks dodgy as
> > hell, as it unmaps an interrupt it doesn't even know is valid. And it
> > does that right when the driver fails the way you experienced it. If,
> > by any chance, the interrupt field is 0 in the firmware table, this
> > results in SGI0 being unmapped. Given that this is the rescheduling
> > interrupt, fireworks happen.
> 
> ... and that explains why. I wouldn't have gotten there, but wish I'd
> thought to test w/ the watchdog compiled out :(

No worries. This IRQ series has uncovered a number of terrible driver
behaviours since I merged it, and these bugs are worth every penny.

> > Can you have a go with the patchlet below, and let me know if that
> > helps?
> 
> It does!

Awesome. I'll Cc you on the actual patch, feel free to respond with a
Tested-by: if you want.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2021-04-21 15:50 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19 16:17 [PATCH 00/11] arm/arm64: Turning IPIs into normal interrupts Marc Zyngier
2020-05-19 16:17 ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 01/11] genirq: Add fasteoi IPI flow Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 19:47   ` Florian Fainelli
2020-05-19 19:47     ` Florian Fainelli
2020-06-12  9:54     ` Marc Zyngier
2020-06-12  9:54       ` Marc Zyngier
2020-05-19 22:25   ` Valentin Schneider
2020-05-19 22:25     ` Valentin Schneider
2020-05-19 22:29     ` Valentin Schneider
2020-05-19 22:29       ` Valentin Schneider
2020-06-12  9:58     ` Marc Zyngier
2020-06-12  9:58       ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 02/11] genirq: Allow interrupts to be excluded from /proc/interrupts Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 03/11] arm64: Allow IPIs to be handled as normal interrupts Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-21 14:03   ` Valentin Schneider
2020-05-21 14:03     ` Valentin Schneider
2020-05-19 16:17 ` [PATCH 04/11] ARM: " Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 22:24   ` Russell King - ARM Linux admin
2020-05-19 22:24     ` Russell King - ARM Linux admin
2020-05-21 14:03     ` Valentin Schneider
2020-05-21 14:03       ` Valentin Schneider
2020-05-21 15:12       ` Russell King - ARM Linux admin
2020-05-21 15:12         ` Russell King - ARM Linux admin
2020-05-21 16:11         ` Valentin Schneider
2020-05-21 16:11           ` Valentin Schneider
2020-05-19 16:17 ` [PATCH 05/11] irqchip/gic-v3: Describe the SGI range Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 06/11] irqchip/gic-v3: Configure SGIs as standard interrupts Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-20  9:52   ` Sumit Garg
2020-05-20  9:52     ` Sumit Garg
2020-05-20 10:24     ` Marc Zyngier
2020-05-20 10:24       ` Marc Zyngier
2020-05-21 14:04   ` Valentin Schneider
2020-05-21 14:04     ` Valentin Schneider
2020-06-12 10:39     ` Marc Zyngier
2020-06-12 10:39       ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 07/11] irqchip/gic: Refactor SMP configuration Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2021-04-20 20:37   ` dann frazier
2021-04-20 20:37     ` dann frazier
2021-04-20 21:25     ` dann frazier
2021-04-20 21:25       ` dann frazier
2021-04-21 10:58       ` Marc Zyngier
2021-04-21 10:58         ` Marc Zyngier
2021-04-21 14:52         ` dann frazier
2021-04-21 14:52           ` dann frazier
2021-04-21 15:49           ` Marc Zyngier [this message]
2021-04-21 15:49             ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 09/11] irqchip/gic-common: Don't enable SGIs by default Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 10/11] irqchip/bcm2836: Configure mailbox interrupts as standard interrupts Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 16:17 ` [PATCH 11/11] arm64: Kill __smp_cross_call and co Marc Zyngier
2020-05-19 16:17   ` Marc Zyngier
2020-05-19 17:50 ` [PATCH 00/11] arm/arm64: Turning IPIs into normal interrupts Florian Fainelli
2020-05-19 17:50   ` Florian Fainelli
2020-05-19 19:47   ` Florian Fainelli
2020-05-19 19:47     ` Florian Fainelli
2020-06-12  9:49   ` Marc Zyngier
2020-06-12  9:49     ` Marc Zyngier
2020-06-12 16:57     ` Florian Fainelli
2020-06-12 16:57       ` Florian Fainelli
2020-05-19 22:25 ` Valentin Schneider
2020-05-19 22:25   ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnsvprio.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dann.frazier@canonical.com \
    --cc=fu.wei@linaro.org \
    --cc=kernel-team@android.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=sumit.garg@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.