From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
To: Stephen Boyd <swboyd@chromium.org>
Cc: "Krishna Chaitanya Chundru" <quic_krichai@quicinc.com>,
helgaas@kernel.org, linux-pci@vger.kernel.org,
linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
mka@chromium.org, quic_vbadigan@quicinc.com,
quic_hemantk@quicinc.com, quic_nitegupt@quicinc.com,
quic_skananth@quicinc.com, quic_ramkri@quicinc.com,
dmitry.baryshkov@linaro.org, "Jingoo Han" <jingoohan1@gmail.com>,
"Gustavo Pimentel" <gustavo.pimentel@synopsys.com>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Krzysztof Wilczyński" <kw@linux.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Andy Gross" <agross@kernel.org>,
"Bjorn Andersson" <bjorn.andersson@linaro.org>,
"Stanimir Varbanov" <svarbanov@mm-sol.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Marc Zyngier" <maz@kernel.org>
Subject: Re: [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend
Date: Sat, 27 Aug 2022 22:56:55 +0530 [thread overview]
Message-ID: <20220827172655.GA14465@thinkpad> (raw)
In-Reply-To: <CAE-0n50NRiBNDjK2UrA_wOoRz3+3cKb4uiUiCw4t1F19Kw9EhA@mail.gmail.com>
On Fri, Aug 26, 2022 at 03:23:00PM -0500, Stephen Boyd wrote:
> Quoting Krishna Chaitanya Chundru (2022-08-25 06:52:43)
> >
> > On 8/24/2022 10:50 PM, Stephen Boyd wrote:
> > > Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59)
> > >> On 8/9/2022 12:42 AM, Stephen Boyd wrote:
> > >>> Quoting Krishna chaitanya chundru (2022-08-03 04:28:53)
> > >>>> If the endpoint device state is D0 and irq's are not freed, then
> > >>>> kernel try to mask interrupts in system suspend path by writing
> > >>>> in to the vector table (for MSIX interrupts) and config space (for MSI's).
> > >>>>
> > >>>> These transactions are initiated in the pm suspend after pcie clocks got
> > >>>> disabled as part of platform driver pm suspend call. Due to it, these
> > >>>> transactions are resulting in un-clocked access and eventually to crashes.
> > >>> Why are the platform driver pm suspend calls disabling clks that early?
> > >>> Can they disable clks in noirq phase, or even later, so that we don't
> > >>> have to check if the device is clocking in the irq poking functions?
> > >>> It's best to keep irq operations fast, so that irq control is fast given
> > >>> that these functions are called from irq flow handlers.
> > >> We are registering the pcie pm suspend ops as noirq ops only. And this
> > >> msix and config
> > >>
> > >> access is coming at the later point of time that is reason we added that
> > >> check.
> > >>
> > > What is accessing msix and config? Can you dump_stack() after noirq ops
> > > are called and figure out what is trying to access the bus when it is
> > > powered down?
> >
> > The msix and config space is being accessed to mask interrupts. The
> > access is coming at the end of the suspend
> >
> > and near CPU disable. We tried to dump the stack there but the call
> > stack is not coming as it is near cpu disable.
>
> That is odd that you can't get a stacktrace.
>
> >
> > But we got dump at resume please have look at it
> >
> > [ 54.946268] Enabling non-boot CPUs ...
> > [ 54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105
> > 43491e4414b1db8a6f59d56b617b520d92a9498e
> > [ 54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP
> > SKU2 platform (DT)
> > [ 54.969088] Call trace:
> > [ 54.971612] dump_backtrace+0x0/0x200
> > [ 54.975399] show_stack+0x20/0x2c
> > [ 54.978826] dump_stack_lvl+0x6c/0x90
> > [ 54.982614] dump_stack+0x18/0x38
> > [ 54.986043] dw_msi_unmask_irq+0x2c/0x58
> > [ 54.990096] irq_enable+0x58/0x90
> > [ 54.993522] __irq_startup+0x68/0x94
> > [ 54.997216] irq_startup+0xf4/0x140
> > [ 55.000820] irq_affinity_online_cpu+0xc8/0x154
> > [ 55.005491] cpuhp_invoke_callback+0x19c/0x6e4
> > [ 55.010077] cpuhp_thread_fun+0x11c/0x188
> > [ 55.014216] smpboot_thread_fn+0x1ac/0x30c
> > [ 55.018445] kthread+0x140/0x30c
> > [ 55.021788] ret_from_fork+0x10/0x20
> > [ 55.028243] CPU1 is up
> >
> > So the same stack should be called at the suspend path while disabling CPU.
>
> Sounds like you're getting hit by affinity changes while offlining CPUs
> during suspend (see irq_migrate_all_off_this_cpu()). That will happen
> after devices are suspended (all phases of suspend ops).
The affinity setting should not happen since DWC MSI controller doesn't support
setting IRQ affinity (hierarchial IRQ domain). In the migrate_one_irq()
function, there is a check for the existence of the irq_set_affinity()
callback, but the DWC MSI controller return -EINVAL in the callback. So this
is the reason the migration was still atempted?
A quick check would be to test this suspend/resume with GIC ITS for MSI since
it supports settings IRQ affinity and resides in a separate domain.
Chaitanya, can you try that?
>
> >
> > If there is any other way to remove these calls can you please help us
> > point that way.
>
> I'm not sure. I believe genirq assumes the irqchips are always
> accessible. There is some support to suspend irqchips. See how the
> struct irq_chip::irq_suspend() function is called by syscore ops in the
> generic irqchip 'irq_gc_syscore_ops' hooks. Maybe you could add a
> syscore suspend/resume hook to disable/enable the clks and power to the
> PCI controller. syscore ops run after secondary CPUs are hotplugged out
> during suspend.
>
> Or maybe setting the IRQCHIP_MASK_ON_SUSPEND flag can be used so that on
> irq migration nothing writes the irq hardware because it is already
> masked in the hardware earlier. I think the problem is that on resume
> we'll restart the irq from the first CPU online event, when you don't
> want to do that because it is too early.
>
> I have another question though, which is do MSIs support wakeup? I don't
> see how it works if the whole bus is effectively off during suspend. If
> wakeup needs to be supported then I suspect the bus can't be powered
> down during suspend.
Wake up should be handled by a dedicated side-band GPIO or in-band PME message.
But I still wonder how the link stays in L1/L1ss when the clocks are disabled
and PHY is powered down. Maybe the link or phy is powered by a separate power
domain like MX that keeps the link active?
Thanks,
Mani
--
மணிவண்ணன் சதாசிவம்
next prev parent reply other threads:[~2022-08-27 17:27 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-03 11:28 [PATCH v5 0/3] PCI: Restrict pci transactions after pci suspend Krishna chaitanya chundru
2022-08-03 11:28 ` [PATCH v5 1/3] PCI: qcom: Add system PM support Krishna chaitanya chundru
2022-08-10 21:50 ` Rob Herring
2022-08-24 3:28 ` Krishna Chaitanya Chundru
2022-08-24 4:11 ` Krishna Chaitanya Chundru
2022-08-03 11:28 ` [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend Krishna chaitanya chundru
2022-08-04 10:24 ` kernel test robot
2022-08-08 19:12 ` Stephen Boyd
2022-08-24 3:37 ` Krishna Chaitanya Chundru
2022-08-24 17:20 ` Stephen Boyd
2022-08-25 13:52 ` Krishna Chaitanya Chundru
2022-08-26 20:23 ` Stephen Boyd
2022-08-27 17:26 ` Manivannan Sadhasivam [this message]
2022-08-29 17:31 ` Krishna Chaitanya Chundru
2022-08-30 11:55 ` Manivannan Sadhasivam
2022-09-05 7:21 ` Sai Prakash Ranjan
2022-08-03 11:28 ` [PATCH v5 3/3] PCI: qcom: Add retry logic for link to be stable in L1ss Krishna chaitanya chundru
2022-08-04 10:24 ` kernel test robot
2022-08-04 21:33 ` Matthias Kaehlcke
2022-08-24 3:41 ` Krishna Chaitanya Chundru
2022-09-09 8:49 ` Krishna Chaitanya Chundru
2022-08-05 3:14 ` kernel test robot
2022-08-24 20:29 ` [PATCH v5 0/3] PCI: Restrict pci transactions after pci suspend Bjorn Helgaas
2022-08-25 13:14 ` Krishna Chaitanya Chundru
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220827172655.GA14465@thinkpad \
--to=manivannan.sadhasivam@linaro.org \
--cc=agross@kernel.org \
--cc=bhelgaas@google.com \
--cc=bjorn.andersson@linaro.org \
--cc=dmitry.baryshkov@linaro.org \
--cc=gustavo.pimentel@synopsys.com \
--cc=helgaas@kernel.org \
--cc=jingoohan1@gmail.com \
--cc=kw@linux.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=maz@kernel.org \
--cc=mka@chromium.org \
--cc=quic_hemantk@quicinc.com \
--cc=quic_krichai@quicinc.com \
--cc=quic_nitegupt@quicinc.com \
--cc=quic_ramkri@quicinc.com \
--cc=quic_skananth@quicinc.com \
--cc=quic_vbadigan@quicinc.com \
--cc=robh@kernel.org \
--cc=svarbanov@mm-sol.com \
--cc=swboyd@chromium.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).