linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn@kernel.org>
To: Anup Patel <apatel@ventanamicro.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Rob Herring <robh+dt@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Conor Dooley <conor+dt@kernel.org>,
	devicetree@vger.kernel.org,
	Saravana Kannan <saravanak@google.com>,
	Marc Zyngier <maz@kernel.org>, Anup Patel <anup@brainfault.org>,
	linux-kernel@vger.kernel.org, Atish Patra <atishp@atishpatra.org>,
	linux-riscv@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,
	Andrew Jones <ajones@ventanamicro.com>
Subject: Re: [PATCH v12 00/25] Linux RISC-V AIA Support
Date: Thu, 01 Feb 2024 19:45:47 +0100	[thread overview]
Message-ID: <87v878dnvo.fsf@all.your.base.are.belong.to.us> (raw)
In-Reply-To: <CAK9=C2UX0sRb5UbLdm8xwe1dP=x+enJRYzAuCPf6MdHTLTC_Cw@mail.gmail.com>

Anup Patel <apatel@ventanamicro.com> writes:

> On Tue, Jan 30, 2024 at 11:19 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Anup Patel <apatel@ventanamicro.com> writes:
>>
>> > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote:
>> >>
>> >> Björn Töpel <bjorn@kernel.org> writes:
>> >>
>> >> > Anup Patel <apatel@ventanamicro.com> writes:
>> >> >
>> >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>> >> >>>
>> >> >>> Björn Töpel <bjorn@kernel.org> writes:
>> >> >>>
>> >> >>> > Anup Patel <apatel@ventanamicro.com> writes:
>> >> >>> >
>> >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> >>> >> process. The latest ratified AIA specifcation can be found at:
>> >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >> >>> >>
>> >> >>> >> At a high-level, the AIA specification adds three things:
>> >> >>> >> 1) AIA CSRs
>> >> >>> >>    - Improved local interrupt support
>> >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >> >>> >>    - Per-HART MSI controller
>> >> >>> >>    - Support MSI virtualization
>> >> >>> >>    - Support IPI along with virtualization
>> >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >> >>> >>    - Wired interrupt controller
>> >> >>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >> >>> >>    - In Direct-mode, injects external interrupts directly into HARTs
>> >> >>> >>
>> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >> >>> >> talk at KVM Forum 2022:
>> >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >> >>> >>
>> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >> >>> >>
>> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >> >>> >> https://github.com/avpatel/linux.git
>> >> >>> >>
>> >> >>> >> Changes since v11:
>> >> >>> >>  - Rebased on Linux-6.8-rc1
>> >> >>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >> >>> >>    MSI handling to per device MSI domains" series by Thomas.
>> >> >>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >> >>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >> >>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>> >> >>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >> >>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >> >>> >>    platform devices.
>> >> >>> >
>> >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >> >>> >
>> >> >>> > I'm hitting a boot hang in text patching, with this series applied on
>> >> >>> > 6.8-rc2. IPI issues?
>> >> >>>
>> >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> >> >>> the others are in cpu_relax(). Smells like IPI...
>> >> >>
>> >> >> I tried bootefi from U-Boot multiple times but can't reproduce the
>> >> >> issue you are seeing.
>> >> >
>> >> > Thanks! I can reproduce without EFI, and simpler command-line:
>> >> >
>> >> > qemu-system-riscv64 \
>> >> >   -bios /path/to/fw_dynamic.bin \
>> >> >   -kernel /path/to/Image \
>> >> >   -append 'earlycon console=tty0 console=ttyS0' \
>> >> >   -machine virt,aia=aplic-imsic \
>> >> >   -no-reboot -nodefaults -nographic \
>> >> >   -smp 4 \
>> >> >   -object rng-random,filename=/dev/urandom,id=rng0 \
>> >> >   -device virtio-rng-device,rng=rng0 \
>> >> >   -m 4G -chardev stdio,id=char0 -serial chardev:char0
>> >> >
>> >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
>> >> > gist [1], and all latest QEMU/OpenSBI:
>> >> >
>> >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
>> >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
>> >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
>> >> >
>> >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
>> >> > panicking about missing root mount ;-))
>> >>
>> >> More context; The hang is during a late initcall, where an ftrace direct
>> >> (register_ftrace_direct()) modification is done.
>> >>
>> >> Stop machine is used to call into __ftrace_modify_call(). Then into the
>> >> arch specific patch_text_nosync(), where flush_icache_range() hangs in
>> >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
>> >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
>> >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
>> >> right before the end of the function.
>> >>
>> >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
>> >> patching fixes the boot hang, but it does seem related to IPI...
>> >>
>> > Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
>> > and friends) correctly.
>> >
>> > On other hand, the flush_icache_mm() does the right thing by
>> > doing local flush on the current CPU and IPI based flush on other
>> > CPUs.
>> >
>> > Can you try the following patch ?
>> >
>> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> > index 55a34f2020a8..a3dfbe4de832 100644
>> > --- a/arch/riscv/mm/cacheflush.c
>> > +++ b/arch/riscv/mm/cacheflush.c
>> > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
>> >
>> >  void flush_icache_all(void)
>> >  {
>> > +    cpumask_t others;
>> > +
>> >      local_flush_icache_all();
>> >
>> > +    cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
>> > +    if (cpumask_empty(&others))
>> > +        return;
>> > +
>> >      if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
>> > -        sbi_remote_fence_i(NULL);
>> > +        sbi_remote_fence_i(&others);
>> >      else
>> > -        on_each_cpu(ipi_remote_fence_i, NULL, 1);
>> > +        on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
>> >  }
>> >  EXPORT_SYMBOL(flush_icache_all);
>>
>> Unfortunately, I see the same hang. LMK if you'd like me to try anything
>> else.
>
> I was able to reproduce this at my end but I had to use your config.
>
> Digging further, it seems the issue is observed only when we use
> in-kernel IPIs for cache flushing (instead of SBI calls) along with
> some of the tracers (or debugging features) enabled. With the tracers
> (or debug features) disabled we don't see any issue. In fact, the
> upstream defconfig works perfectly fine with AIA drivers and
> in-kernel IPIs.

Same here. I only see the issue for *one* scenario. Other than that
scenario, AIA is working fine! We're doing ftrace text patching, and I
wonder if this is the issue. RISC-V (unfortunately) still rely on
stop_machine() text patching (which will change!).

Again, the hang is in stop_machine() context, where interrupts should
very much be disabled, right? So, triggering an IPI will be impossible.

Dumping mstatus in QEMU:
  | mstatus  0000000a000000a0
  | mstatus  0000000a000000a0
  | mstatus  0000000a000000a0
  | mstatus  0000000a000000a0

Indeed sstatus.SIE is 0.

Seems like the bug is that text patching is trying to issue an IPI:
  | [<ffffffff801145d4>] smp_call_function_many_cond+0x81e/0x8ba
  | [<ffffffff80114716>] on_each_cpu_cond_mask+0x3e/0xde
  | [<ffffffff80013968>] flush_icache_all+0x98/0xc4
  | [<ffffffff80009c26>] patch_text_nosync+0x7c/0x146
  | [<ffffffff80ef9116>] __ftrace_modify_call.constprop.0+0xca/0x120
  | [<ffffffff80ef918c>] ftrace_update_ftrace_func+0x20/0x40
  | [<ffffffff80efb8ac>] ftrace_modify_all_code+0x5a/0x1d8
  | [<ffffffff80efba50>] __ftrace_modify_code+0x26/0x42
  | [<ffffffff80131734>] multi_cpu_stop+0x14e/0x1d8
  | [<ffffffff8013107a>] cpu_stopper_thread+0x9e/0x182
  | [<ffffffff80077a04>] smpboot_thread_fn+0xf8/0x1d2
  | [<ffffffff800718fc>] kthread+0xe8/0x108
  | [<ffffffff80f1cde6>] ret_from_fork+0xe/0x20


Björn

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2024-02-01 18:45 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-27 16:17 [PATCH v12 00/25] Linux RISC-V AIA Support Anup Patel
2024-01-27 16:17 ` [PATCH v12 01/25] irqchip/gic-v3: Make gic_irq_domain_select() robust for zero parameter count Anup Patel
2024-02-15 11:47   ` Marc Zyngier
2024-01-27 16:17 ` [PATCH v12 02/25] genirq/irqdomain: Remove the param count restriction from select() Anup Patel
2024-02-22 13:01   ` Aishwarya TCV
2024-02-22 16:28     ` Marc Zyngier
2024-02-22 22:59       ` Aishwarya TCV
     [not found]   ` <CGME20240223102258eucas1p119f38e40f769c883c0a502e9e26be888@eucas1p1.samsung.com>
2024-02-23 10:22     ` Marek Szyprowski
2024-02-23 10:45       ` Biju Das
2024-02-23 10:56         ` Marek Szyprowski
2024-02-23 11:01           ` Biju Das
2024-01-27 16:17 ` [PATCH v12 03/25] genirq/msi: Extend msi_parent_ops Anup Patel
2024-01-27 16:17 ` [PATCH v12 04/25] genirq/irqdomain: Add DOMAIN_BUS_DEVICE_IMS Anup Patel
2024-02-15 11:54   ` Marc Zyngier
2024-02-15 15:01     ` Thomas Gleixner
2024-01-27 16:17 ` [PATCH v12 05/25] platform-msi: Prepare for real per device domains Anup Patel
2024-01-27 16:17 ` [PATCH v12 06/25] irqchip: Convert all platform MSI users to the new API Anup Patel
2024-01-27 16:17 ` [PATCH v12 07/25] genirq/msi: Provide optional translation op Anup Patel
2024-01-27 16:17 ` [PATCH v12 08/25] genirq/msi: Split msi_domain_alloc_irq_at() Anup Patel
2024-01-27 16:17 ` [PATCH v12 09/25] genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI Anup Patel
2024-01-27 16:17 ` [PATCH v12 10/25] genirq/msi: Optionally use dev->fwnode for device domain Anup Patel
2024-01-27 16:17 ` [PATCH v12 11/25] genirq/msi: Provide allocation/free functions for "wired" MSI interrupts Anup Patel
2024-01-27 16:17 ` [PATCH v12 12/25] genirq/irqdomain: Reroute device MSI create_mapping Anup Patel
2024-01-27 16:17 ` [PATCH v12 13/25] genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV Anup Patel
2024-01-27 16:17 ` [PATCH v12 14/25] irqchip/sifive-plic: Convert PLIC driver into a platform driver Anup Patel
2024-02-16 15:33   ` Thomas Gleixner
2024-02-16 17:11     ` Anup Patel
2024-02-16 20:22       ` Thomas Gleixner
2024-02-17  5:42         ` Anup Patel
2024-01-27 16:17 ` [PATCH v12 15/25] irqchip/riscv-intc: Add support for RISC-V AIA Anup Patel
2024-01-27 16:17 ` [PATCH v12 16/25] dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller Anup Patel
2024-01-27 16:17 ` [PATCH v12 17/25] genirq/matrix: Dynamic bitmap allocation Anup Patel
2024-01-27 16:17 ` [PATCH v12 18/25] irqchip: Add RISC-V incoming MSI controller early driver Anup Patel
2024-02-07  9:43   ` Björn Töpel
2024-02-16 18:40   ` Thomas Gleixner
2024-02-18 13:16     ` Anup Patel
2024-01-27 16:17 ` [PATCH v12 19/25] irqchip/riscv-imsic: Add device MSI domain support for platform devices Anup Patel
2024-02-06 15:36   ` Björn Töpel
2024-02-16 20:12   ` Thomas Gleixner
2024-02-19  4:10     ` Anup Patel
2024-01-27 16:17 ` [PATCH v12 20/25] irqchip/riscv-imsic: Add device MSI domain support for PCI devices Anup Patel
2024-02-16 20:14   ` Thomas Gleixner
2024-02-19  4:41     ` Anup Patel
2024-01-27 16:17 ` [PATCH v12 21/25] dt-bindings: interrupt-controller: Add RISC-V advanced PLIC Anup Patel
2024-01-27 16:17 ` [PATCH v12 22/25] irqchip: Add RISC-V advanced PLIC driver for direct-mode Anup Patel
2024-02-01  6:39   ` Andy Chiu
2024-02-19 10:28     ` Anup Patel
2024-02-02  9:29   ` Clément Léger
2024-02-02 10:30     ` Anup Patel
2024-02-02 10:33       ` Clément Léger
2024-02-16 20:50   ` Thomas Gleixner
2024-02-19  9:35     ` Anup Patel
2024-01-27 16:17 ` [PATCH v12 23/25] irqchip/riscv-aplic: Add support for MSI-mode Anup Patel
2024-02-16 21:04   ` Thomas Gleixner
2024-02-19  9:45     ` Anup Patel
2024-01-27 16:17 ` [PATCH v12 24/25] RISC-V: Select APLIC and IMSIC drivers Anup Patel
2024-01-27 16:17 ` [PATCH v12 25/25] MAINTAINERS: Add entry for RISC-V AIA drivers Anup Patel
2024-01-27 16:20 ` [PATCH v12 00/25] Linux RISC-V AIA Support Anup Patel
2024-02-14 19:54   ` Thomas Gleixner
2024-02-15  5:48     ` Anup Patel
2024-02-15 19:59       ` Thomas Gleixner
2024-02-16 21:05         ` Thomas Gleixner
2024-02-20  6:12           ` Anup Patel
2024-02-15 11:57     ` Marc Zyngier
2024-01-30  7:16 ` Björn Töpel
2024-01-30  7:52   ` Björn Töpel
2024-01-30 10:02     ` Anup Patel
2024-01-30 11:05       ` Björn Töpel
2024-01-30 10:23     ` Anup Patel
2024-01-30 11:46       ` Björn Töpel
2024-01-30 14:48         ` Björn Töpel
2024-01-30 15:19           ` Anup Patel
2024-01-30 15:48           ` Anup Patel
2024-01-30 17:49             ` Björn Töpel
2024-02-01 15:07               ` Anup Patel
2024-02-01 18:45                 ` Björn Töpel [this message]
2024-02-06 15:39 ` Björn Töpel
2024-02-06 17:39   ` Anup Patel
2024-02-07  7:27     ` Björn Töpel
2024-02-07  9:18       ` Anup Patel
2024-02-07  9:37         ` Björn Töpel
2024-02-07 12:55           ` Björn Töpel
2024-02-07 13:08             ` Anup Patel
2024-02-07 13:10             ` Anup Patel
2024-02-08 10:10 ` Andrea Parri
2024-02-16 11:33   ` Anup Patel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v878dnvo.fsf@all.your.base.are.belong.to.us \
    --to=bjorn@kernel.org \
    --cc=ajones@ventanamicro.com \
    --cc=anup@brainfault.org \
    --cc=apatel@ventanamicro.com \
    --cc=atishp@atishpatra.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh+dt@kernel.org \
    --cc=saravanak@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).