From: Bjorn Helgaas <helgaas@kernel.org>
To: Prabhakar Kushwaha <prabhakar.pkin@gmail.com>,
Jens Axboe <axboe@kernel.dk>
Cc: linux-pci@vger.kernel.org, linux-ide@vger.kernel.org,
Ganapatrao Prabhakerrao Kulkarni <gkulkarni@marvell.com>,
Kamlakant Patel <kamlakantp@marvell.com>,
kexec mailing list <kexec@lists.infradead.org>
Subject: Re: kexec -e not working: root disk not able to detect
Date: Thu, 9 Jan 2020 18:26:38 -0600 [thread overview]
Message-ID: <20200110002638.GA50413@google.com> (raw)
In-Reply-To: <CAJ2QiJ+MVVztHONagmYc2-BzbtdGQhABRKO7h4+kOE9cCK=CxA@mail.gmail.com>
[+cc Jens, ahci.c maintainer]
On Mon, Jan 06, 2020 at 05:24:44PM +0530, Prabhakar Kushwaha wrote:
> Hi All,
>
> I am trying kexec -e with latest kernel i.e. Linux-5.5.0-rc4. Here
> second kernel is not able to detect/mount hard-disk having root file
> system (INTEL SSDSC2BB240G7).
>
> [ 279.690575] ata1: softreset failed (1st FIS failed)
> [ 279.695446] ata1: limiting SATA link speed to 3.0 Gbps
> [ 280.910020] ata1: SATA link down (SStatus 0 SControl 320)
> [ 282.626018] ata1: SATA link down (SStatus 0 SControl 300)
> [ 282.631409] ata1: link online but 1 devices misclassified, retrying
> [ 282.637665] ata1: reset failed (errno=-11), retrying in 9 secs
> [ 298.294546] ata1: failed to reset engine (errno=-5)
> [ 302.042967] ata1: softreset failed (1st FIS failed)
> [ 308.798609] ata1: failed to reset engine (errno=-5)
> [ 337.546605] ata1: softreset failed (1st FIS failed)
> [ 337.551475] ata1: limiting SATA link speed to 3.0 Gbps
> [ 338.766022] ata1: SATA link down (SStatus 0 SControl 320)
> [ 339.270943] ata1: EH pending after 5 tries, giving up
>
> I found following two workaround for this issue.
> A) Define ".shutdown" in driver/ata/ahci.c.
>
> reboot --> kernel_kexec() --> kernel_restart_prepare() -->
> device_shutdown() --> pci_device_shutdown() --> ahci_shutdown_one()
> --> new function
>
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 4bfd1b14b390..50a101002885 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -81,6 +81,7 @@ enum board_ids {
>
> static int ahci_init_one(struct pci_dev *pdev, const struct
> pci_device_id *ent);
> static void ahci_remove_one(struct pci_dev *dev);
> +static void ahci_shutdown_one(struct pci_dev *dev);
> static int ahci_vt8251_hardreset(struct ata_link *link, unsigned int *class,
> unsigned long deadline);
> static int ahci_avn_hardreset(struct ata_link *link, unsigned int *class,
> @@ -606,6 +607,7 @@ static struct pci_driver ahci_pci_driver = {
> .id_table = ahci_pci_tbl,
> .probe = ahci_init_one,
> .remove = ahci_remove_one,
> + .shutdown = ahci_shutdown_one,
> .driver = {
> .pm = &ahci_pci_pm_ops,
> },
>
> +static void ahci_shutdown_one(struct pci_dev *pdev)
> +{
> + pm_runtime_get_noresume(&pdev->dev);
> + ata_pci_remove_one(pdev);
> +}
> +
> Note: After defining shutdown, error related to file-system write
> seen. Looks like even after device_shutdown, file system related
> transaction goes to disk.
>
> B)) Commenting of pci_clear_master() from pci_device_shutdown()
> reboot --> kernel_kexec() --> kernel_restart_prepare() -->
> device_shutdown() --> pci_device_shutdown()
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 0454ca0e4e3f..ddffaa9321bb 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -481,8 +481,10 @@ static void pci_device_shutdown(struct device *dev)
> /*
> * If this is a kexec reboot, turn off Bus Master bit on the
> @@ -491,8 +493,16 @@ static void pci_device_shutdown(struct device *dev)
> * If it is not a kexec reboot, firmware will hit the PCI
> * devices with big hammer and stop their DMA any way.
> */
>
> - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
> - pci_clear_master(pci_dev);
I doubt we would remove this without a much clearer justification.
> Here pci_dev current_state. It is "0" i.e. D0.
>
> From A and B. Looks like even after pci_clear_master(), Some DMA
> transactions going on PCIe device causing device in unstable.
> Not sure if this is the reason and how to solve this problem.
Is it possible the ahci driver depends on receiving the device with
bus mastering already enabled? I would guess that would be the common
case in a non-kexec boot -- the BIOS probably hands off the device
with bus mastering enabled.
ahci_init_one() does turn on bus mastering itself (it calls
pci_set_master()), but it's near the end, do if anything before that
depends on DMA, it wouldn't work.
And I don't know how adding a shutdown method would also be a
workaround.
Bjorn
next prev parent reply other threads:[~2020-01-10 0:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-06 11:54 kexec -e not working: root disk not able to detect Prabhakar Kushwaha
2020-01-10 0:26 ` Bjorn Helgaas [this message]
2020-01-13 4:58 ` Prabhakar Kushwaha
2020-01-20 15:14 ` Prabhakar Kushwaha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200110002638.GA50413@google.com \
--to=helgaas@kernel.org \
--cc=axboe@kernel.dk \
--cc=gkulkarni@marvell.com \
--cc=kamlakantp@marvell.com \
--cc=kexec@lists.infradead.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=prabhakar.pkin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.