public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Khalid Aziz and Shuah Khan <azizkhan@gonehiking.org>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Kairui Song <kasong@redhat.com>,
	linux-pci@vger.kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Deepa Dinamani <deepa.kernel@gmail.com>,
	Randy Wright <rwright@hpe.com>,
	dyoung@redhat.com
Subject: Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
Date: Sat, 11 Jan 2020 08:45:10 +0800	[thread overview]
Message-ID: <20200111004510.GA19291@MiWiFi-R3L-srv> (raw)
In-Reply-To: <20200110230003.GB1875851@anatevka.americas.hpqcorp.net>

On 01/10/20 at 04:00pm, Jerry Hoemann wrote:
> > I am not understanding this failure mode either. That code in
> > pci_device_shutdown() was added originally to address this very issue.
> > The patch 4fc9bbf98fd6 ("PCI: Disable Bus Master only on kexec reboot")
> > shut down any errant DMAs from PCI devices as we kexec a new kernel. In
> > this new patch, this is the same code path that will be taken again when
> > kdump kernel is shutting down. If the errant DMA problem was not fixed
> > by clearing Bus Master bit in this path when kdump kernel was being
> > kexec'd, why does the same code path work the second time around when
> > kdump kernel is shutting down? Is there more going on that we don't
> > understand?
> > 
> 
>   Khalid,
> 
>   I don't believe we execute that code path in the crash case.
> 
>   The variable kexec_in_progress is set true in kernel_kexec() before calling
>   machine_kexec().  This is the fast reboot case.
> 
>   I don't see kexec_in_progress set true elsewhere.
> 
> 
>   The code path for crash is different.
> 
>   For instance, panic() will call
> 	-> __crash_kexec()  which calls
> 		-> machine_kexec().
> 
>  So the setting of kexec_in_progress is bypassed.

Yeah, it's a differet behaviour than kexec case. I talked to Kairui, the
patch log may be not very clear. Below is summary I got from my
understanding about this issue:

~~~~~~~~~~~~~~~~~~~~~~~
Problem:

When crash is triggered, system jumps into kdump kernel to collect
vmcore and dump out. After dumping is finished, kdump kernel will try
ty reboot to normal kernel. This hang happened during kdump kernel
rebooting, when dumping is network dumping, e.g ssh/nfs, local storage
is HPSA.

Root cause:

When configuring network dumping, only network driver modules are added
into kdump initramfs. However, the storage HPSA pcie device is enabled
in 1st kernel, its status is PCI_D3hot. When crashed system jumps to kdump
kernel, we didn't shutdown any device for safety and efficiency. Then
during kdump kernel boot up, the pci scan will get hpsa device and only
initialize its status as pci_dev->current_state = PCI_UNKNOWN. This
pci_dev->current_state will be manipulated by the relevant device
driver. So HPSA device will never have chance to calibrate its status,
and can't be shut down by pci_device_shutdown() called by reboot
service. It's still PCI_D3hot, then crash happened when system try to
shutdown its upper bridge.

Fix:

Here, Kairui uses a quirk to get PM state and mask off value bigger than
PCI_D3cold. Means, all devices will get PM state 
pci_dev->current_state = PCI_D0 or PCI_D3hot. Finally, during kdump
reboot stage, this device can be shut down successfully by clearing its
master bit.

~~~~~~~~~~~~~~~

About this patch, I think the quirk getting active PM state for all devices
may be risky, it will impact normal kernel too which doesn't have this issue.

Wondering if there's any other way to fix or work around it.

Thanks
Baoquan


  parent reply	other threads:[~2020-01-11  0:45 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 19:21 [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel Kairui Song
2020-01-03  7:58 ` Kairui Song
2020-01-10 21:42 ` Bjorn Helgaas
2020-01-10 22:25   ` Khalid Aziz and Shuah Khan
2020-01-10 23:00     ` Jerry Hoemann
2020-01-11  0:18       ` Khalid Aziz
2020-01-11  0:50         ` Baoquan He
2020-01-11  3:45           ` Khalid Aziz
2020-01-11  9:35             ` Kairui Song
2020-01-11 18:32               ` Deepa Dinamani
2020-01-13 17:07                 ` Kairui Song
2020-01-15  1:16                   ` Deepa Dinamani
2020-01-15  7:56                     ` Kairui Song
2020-01-15 17:30                   ` Khalid Aziz
2020-01-15 18:05                     ` Kairui Song
2020-01-15 21:17                       ` Khalid Aziz
2020-01-17  3:24                         ` Dave Young
2020-01-17  3:46                           ` Baoquan He
2020-01-17 15:44                           ` Khalid Aziz
2020-01-11 10:04             ` Baoquan He
2020-01-11  0:45       ` Baoquan He [this message]
2020-01-11  0:51         ` Baoquan He
2020-01-11  1:46         ` Baoquan He
2020-01-11  9:24         ` Kairui Song
2020-01-10 23:36   ` Jerry Hoemann
2020-01-11  8:46   ` Kairui Song
2020-02-22 16:56 ` Bjorn Helgaas
2020-02-24  4:56   ` Dave Young
2020-02-24 17:30   ` Kairui Song
2020-02-28 19:53     ` Deepa Dinamani
2020-03-03 21:01       ` Deepa Dinamani
2020-03-05  3:53         ` Baoquan He
2020-03-05  4:53           ` Deepa Dinamani
2020-03-05  6:06             ` Deepa Dinamani
2020-03-06  9:38             ` Baoquan He
2020-07-22 14:52               ` Kairui Song
2020-07-22 15:21                 ` Bjorn Helgaas
2020-07-22 21:50                   ` Jerry Hoemann
2020-07-23  0:00                     ` Bjorn Helgaas
2020-07-23 18:34                       ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200111004510.GA19291@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=azizkhan@gonehiking.org \
    --cc=deepa.kernel@gmail.com \
    --cc=dyoung@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=jerry.hoemann@hpe.com \
    --cc=kasong@redhat.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rwright@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox