From: Jon Hunter <jonathanh@nvidia.com>
To: Radu Rendec <rrendec@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Manivannan Sadhasivam <mani@kernel.org>
Cc: "Daniel Tsai" <danielsftsai@google.com>,
"Marek Behún" <kabel@kernel.org>,
"Krishna Chaitanya Chundru" <quic_krichai@quicinc.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Rob Herring" <robh@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Jingoo Han" <jingoohan1@gmail.com>,
"Brian Masney" <bmasney@redhat.com>,
"Eric Chanudet" <echanude@redhat.com>,
"Alessandro Carminati" <acarmina@redhat.com>,
"Jared Kangas" <jkangas@redhat.com>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>
Subject: Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support
Date: Fri, 23 Jan 2026 13:25:28 +0000 [thread overview]
Message-ID: <3edbf9bd-4542-40d3-888c-470e793a46c0@nvidia.com> (raw)
In-Reply-To: <96c9d483f67be02fa1dba736fea465216d0c3269.camel@redhat.com>
On 22/01/2026 23:31, Radu Rendec wrote:
...
> Thanks very much for running the test and for the logs. The good news
> is good ol' printk debugging seems to be working, and the last message
> in the log is indeed related to dw-pci irq affinity control, which is
> what the patch touches. So we're on to something. The bad news is I
> can't yet figure out what's wrong.
>
> The CPUs are taken offline one by one, starting with CPU 7. The code in
> question runs on the dying CPU, and with hardware interrupts disabled
> on all CPUs. The (simplified) call stack looks like this:
>
> irq_migrate_all_off_this_cpu
> for_each_active_irq
> migrate_one_irq
> irq_do_set_affinity
> irq_chip_redirect_set_affinity (via chip->irq_set_affinity)
>
> The debug patch I gave you adds:
> * a printk to irq_chip_redirect_set_affinity (which is very small)
> * a printk at the beginning of migrate_one_irq
>
> Also, the call to irq_do_set_affinity is almost the last thing that
> happens in migrate_one_irq, and that for_each_active_irq loop is quite
> small too. So, there isn't much happening between the printk in
> irq_chip_redirect_set_affinity for the msi irq (which we do see in the
> log) and the printk in migrate_one_irq for the next irq (which we don't
> see).
>
> My first thought is to add more printk's between those two and narrow
> down the spot where it gets stuck.
>
> I think the fastest way to debug it is if I can test myself. I tried to
> reproduce the issue on a Jetson AGX Orin, and I couldn't. By the way,
> how often does it hang? e.g., out of say 10 suspend attempts, how many
> fail?
For Jetson AGX Xavier it fails on the first suspend attempt.
> I do have access to a Jetson Xavier NX (in theory) but it looks like
> there's a lab issue with that board, which hopefully gets sorted out
> tomorrow. If I can't get a hold of that board (or can't reproduce the
> problem on it), I may ask you to try a few other things. In any case,
> I'll update this thread again either tomorrow or (more likely) early
> next week.
Weirdly I don't see this with Jetson Xavier NX. However, could be worth
trying but you may wish to revert this change [0] because it is causing
other issues for Jetson Xavier NX.
Jon
[0]
https://lore.kernel.org/linux-tegra/e32b0819-2c29-4c83-83d5-e28dc4b2b01f@nvidia.com/
--
nvpublic
next prev parent reply other threads:[~2026-01-23 13:25 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-28 21:20 [PATCH v3 0/3] Enable MSI affinity support for dwc PCI Radu Rendec
2025-11-28 21:20 ` [PATCH v3 1/3] genirq: Add interrupt redirection infrastructure Radu Rendec
2025-11-28 21:20 ` [PATCH v3 2/3] PCI: dwc: Code cleanup Radu Rendec
2025-11-28 21:20 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Radu Rendec
2026-01-20 18:01 ` Jon Hunter
2026-01-20 22:30 ` Radu Rendec
2026-01-21 14:00 ` Jon Hunter
2026-01-22 23:31 ` Radu Rendec
2026-01-23 13:25 ` Jon Hunter [this message]
2026-01-26 7:59 ` Thomas Gleixner
2026-01-26 22:07 ` Jon Hunter
2026-01-26 22:26 ` Radu Rendec
2026-01-27 10:30 ` Thomas Gleixner
2026-01-27 13:34 ` Thomas Gleixner
2026-01-27 17:09 ` Jon Hunter
2026-01-27 21:30 ` [PATCH] genirq/redirect: Prevent writing MSI message on affinity change Thomas Gleixner
2026-03-26 3:48 ` [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support Tsai Sung-Fu
2026-03-26 12:52 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3edbf9bd-4542-40d3-888c-470e793a46c0@nvidia.com \
--to=jonathanh@nvidia.com \
--cc=acarmina@redhat.com \
--cc=bhelgaas@google.com \
--cc=bmasney@redhat.com \
--cc=danielsftsai@google.com \
--cc=echanude@redhat.com \
--cc=jingoohan1@gmail.com \
--cc=jkangas@redhat.com \
--cc=kabel@kernel.org \
--cc=kwilczynski@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=quic_krichai@quicinc.com \
--cc=robh@kernel.org \
--cc=rrendec@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox