From: Thomas Gleixner <tglx@kernel.org>
To: Yicong Yang <yang.yicong@picoheart.com>,
Anup Patel <apatel@ventanamicro.com>
Cc: yang.yicong@picoheart.com, anup@brainfault.org, pjw@kernel.org,
palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
geshijian@picoheart.com, weidong.wd@picoheart.com,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Danilo Krummrich <dakr@kernel.org>
Subject: Re: [PATCH] irqchip/riscv-aplic: Register the driver prior to device creation
Date: Wed, 14 Jan 2026 20:50:32 +0100 [thread overview]
Message-ID: <877btkht2v.ffs@tglx> (raw)
In-Reply-To: <7b859dd5-9262-4d68-9a8e-e0be0c24ac4a@picoheart.com>
On Wed, Jan 14 2026 at 19:48, Yicong Yang wrote:
> On 1/14/26 4:57 PM, Anup Patel wrote:
>> On Wed, Jan 14, 2026 at 12:08 PM Yicong Yang <yang.yicong@picoheart.com> wrote:
>>>
>>> On RISC-V the APLIC serves part of the GSI interrupts, but unlike
>>> other arthitecture it's initialized a bit late on ACPI based
>>> system:
>>> - the spec only mandates the report in DSDT (riscv-brs rule AML_100)
>>> so the APLIC is created as platform_device when scanning DSDT
>>> - the driver is registered and initialize the device in device_initcall
>>> stage
>>>
>>> The creation of devices depends on APLIC is deferred after the APLIC
>>> is initialized (when the driver calls acpi_dev_clear_dependencies),
>>> not like most other devices which is created when scanning the DSDT.
>>> The affected devices include those declare the dependency explicitly
>>> by ACPI _DEP method and _PRT for PCIe host bridge and those require
>>> their interrupts as GSI. Furhtermore, the deferred creation is
>>> performed in an async way (queued in the system_dfl_wq workqueue)
>>> but all contend on the acpi_scan_lock.
The lock contention is irrelevant to the real underlying problem.
>>> Since the deferred devcie creation is asynchronous and will contend
>>> for the same lock, the order and timing is not certain. And the time
>>> is late enough for the device creation running parallel with the init
>>> task. This will lead to below issues (also observed on our platforms):
>>> - the console/tty device is created lately and sometimes it's not ready
>>> when init task check for its presence. the system will crash in the
>>> latter case since the init task always requires a valid console.
>>> - the root device will by probed and registered lately (e.g. NVME,
>>> after the init task executed) and may run into the rescue shell if
>>> root device is not found.
And again, you _cannot_ solve this problem completely with initcall
ordering;
Deferred probing with delegation to work queues has the systemic
issue that there is no guarantee that all devices, which are required
to actually proceed to userspace, have been initialized at that
point.
Changing the initcall priority of a particular driver papers over the
underlying problem to the extent that _you_ cannot observe it anymore,
but that provides exactly _zero_ guarantee that it is correct under all
circumstances. "Works for me" is the worst engineering principle as you
might know already.
That said, I still refuse to take random initcall ordering patches
unless somebody comes up with a coherent explanation of the actual
guarantee.
But before you start to come up with more fairy tales, let me come back
to your two points from above:
>>> - the console/tty device is created lately and sometimes it's not ready
>>> when init task check for its presence. the system will crash in the
>>> latter case since the init task always requires a valid console.
I assume you want to say that console_on_rootfs() fails to open
'/dev/console', right?
That's obvious because console_on_rootfs() is invoked _before_
async_synchronize_full() is invoked which ensures that all outstanding
initialization work has been completed.
The fix for this is obvious too and it's therefore bloody obvious that
changing the init call priority of a random driver does not fix that at
all, no?
But that's not sufficient, see below.
>>> - the root device will by probed and registered lately (e.g. NVME,
>>> after the init task executed) and may run into the rescue shell if
>>> root device is not found.
You completely fail to explain how outstanding initializations in work
queues survive past the async_synchronize_full() synchronization
point. You are merely describing random observations on your system, but
you stopped right there without trying to decode the underlying root
cause.
The root cause is:
1) as I already said above that deferred probing does not provide any
guarantees at all.
2) async_synchronize_full() is obviously not the barrier which it is
supposed to be (the misplaced console_on_rootfs() call aside).
That needs to be fixed at the conceptual level and not hacked around
with "works for me" patches and fairy tale change logs.
Thanks,
tglx
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-01-14 19:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 6:37 [PATCH] irqchip/riscv-aplic: Register the driver prior to device creation Yicong Yang
2026-01-14 8:57 ` Anup Patel
2026-01-14 11:48 ` Yicong Yang
2026-01-14 19:50 ` Thomas Gleixner [this message]
2026-01-15 8:31 ` Yicong Yang
2026-01-15 13:28 ` Thomas Gleixner
2026-01-16 6:16 ` Yicong Yang
2026-01-16 11:41 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877btkht2v.ffs@tglx \
--to=tglx@kernel.org \
--cc=alex@ghiti.fr \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=apatel@ventanamicro.com \
--cc=dakr@kernel.org \
--cc=geshijian@picoheart.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=pjw@kernel.org \
--cc=rafael@kernel.org \
--cc=weidong.wd@picoheart.com \
--cc=yang.yicong@picoheart.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox