* Boot failure due to some interaction between per-port MSI-X and Intel RST
@ 2017-09-04 1:42 John Loy
2017-09-04 6:16 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: John Loy @ 2017-09-04 1:42 UTC (permalink / raw)
To: linux-ide
I have a system that stopped booting Linux between kernel versions 4.4.9
and 4.5.3. It has a SATA + NVMe accelerated volume that I use with
Windows and a separate SATA drive with my Linux installation. I'm not
expecting the remapped NVMe thing to be accessible, just the Linux disk,
but none of the drives are accessible.
Bisecting the changes turned up d684a90 as the first failing change.
Passing pci=nomsi also allows the system to boot newer kernels. Just to
be sure, I built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag
removed from the per-port call to pci_alloc_irq_vectors in
ahci_init_msi. This also allowed the system to boot normally.
I'm totally out of my depth though so I'd really appreciate it if anyone
has some ideas on how to proceed with a proper fix.
Thanks,
John
I can post the entire dmesg output if it would be helpful but the
following seems like the most relevant parts:
[ 0.516776] libata version 3.00 loaded.
[ 2.866275] ahci 0000:00:17.0: version 3.0
[ 2.866331] ahci 0000:00:17.0: Found 1 remapped NVMe devices.
[ 2.866331] ahci 0000:00:17.0: Switch your BIOS from RAID to AHCI
mode to use them.
[ 2.866335] ahci 0000:00:17.0: controller can't do SNTF, turning off
CAP_SNTF
[ 2.876533] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 6 ports 6 Gbps
0x3f impl RAID mode
[ 2.876534] ahci 0000:00:17.0: flags: 64bit ncq led clo only pio slum
part deso sadm sds apst
[ 2.876538] ahci 0000:00:17.0: both AHCI_HFLAG_MULTI_MSI flag set and
custom irq handler implemented
[ 2.877032] scsi host0: ahci
[ 2.877178] scsi host1: ahci
[ 2.877342] scsi host2: ahci
[ 2.877483] scsi host3: ahci
[ 2.877647] scsi host4: ahci
[ 2.877831] scsi host5: ahci
[ 2.877845] ata1: SATA max UDMA/133 abar m524288@0xdf200000 port
0xdf200100 irq 121
[ 2.877846] ata2: SATA max UDMA/133 abar m524288@0xdf200000 port
0xdf200180 irq 122
[ 2.877847] ata3: SATA max UDMA/133 abar m524288@0xdf200000 port
0xdf200200 irq 123
[ 2.877848] ata4: SATA max UDMA/133 abar m524288@0xdf200000 port
0xdf200280 irq 124
[ 2.877849] ata5: SATA max UDMA/133 abar m524288@0xdf200000 port
0xdf200300 irq 125
[ 2.877850] ata6: SATA max UDMA/133 abar m524288@0xdf200000 port
0xdf200380 irq 126
[ 3.186836] ata6: SATA link down (SStatus 4 SControl 300)
[ 3.186882] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 3.186919] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 3.186970] ata2: SATA link down (SStatus 4 SControl 300)
[ 3.187003] ata1: SATA link down (SStatus 4 SControl 300)
[ 3.187042] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 8.676029] ata5.00: qc timeout (cmd 0xa1)
[ 8.676037] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 8.676054] ata4.00: qc timeout (cmd 0xec)
[ 8.676062] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 8.676079] ata3.00: qc timeout (cmd 0xec)
[ 8.676087] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 8.986227] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 8.986282] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 8.986309] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 19.428027] ata4.00: qc timeout (cmd 0xec)
[ 19.428035] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 19.428039] ata4: limiting SATA link speed to 1.5 Gbps
[ 19.428056] ata3.00: qc timeout (cmd 0xec)
[ 19.428064] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 19.428068] ata3: limiting SATA link speed to 3.0 Gbps
[ 19.428086] ata5.00: qc timeout (cmd 0xa1)
[ 19.428094] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 19.428097] ata5: limiting SATA link speed to 1.5 Gbps
[ 19.738226] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 19.738251] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 19.738276] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 50.148029] ata5.00: qc timeout (cmd 0xa1)
[ 50.148037] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 50.148054] ata4.00: qc timeout (cmd 0xec)
[ 50.148062] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 50.148078] ata3.00: qc timeout (cmd 0xec)
[ 50.148086] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 50.458231] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 50.458260] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 50.458289] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Boot failure due to some interaction between per-port MSI-X and Intel RST
2017-09-04 1:42 Boot failure due to some interaction between per-port MSI-X and Intel RST John Loy
@ 2017-09-04 6:16 ` Christoph Hellwig
2017-09-05 15:32 ` John Loy
2017-09-05 15:46 ` Dan Williams
0 siblings, 2 replies; 4+ messages in thread
From: Christoph Hellwig @ 2017-09-04 6:16 UTC (permalink / raw)
To: John Loy; +Cc: linux-ide, Dan Williams
On Sun, Sep 03, 2017 at 06:42:35PM -0700, John Loy wrote:
> I have a system that stopped booting Linux between kernel versions 4.4.9 and
> 4.5.3. It has a SATA + NVMe accelerated volume that I use with Windows and
> a separate SATA drive with my Linux installation. I'm not expecting the
> remapped NVMe thing to be accessible, just the Linux disk, but none of the
> drives are accessible.
>
> Bisecting the changes turned up d684a90 as the first failing change. Passing
> pci=nomsi also allows the system to boot newer kernels. Just to be sure, I
> built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag removed from the
> per-port call to pci_alloc_irq_vectors in ahci_init_msi. This also allowed
> the system to boot normally.
>
> I'm totally out of my depth though so I'd really appreciate it if anyone has
> some ideas on how to proceed with a proper fix.
Something like the patch below should work. Maybe Intel can provide
an explanation on why their chipset is so fucked up that we can add
as a comment.
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 5a5fd0b404eb..b8c8ecc854c4 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1470,6 +1470,7 @@ static void ahci_remap_check(struct pci_dev *pdev, int bar,
dev_warn(&pdev->dev, "Found %d remapped NVMe devices.\n", count);
dev_warn(&pdev->dev, "Switch your BIOS from RAID to AHCI mode to use them.\n");
+ hpriv->flags |= AHCI_HFLAG_NO_MSI;
}
static int ahci_get_irq_vector(struct ata_host *host, int port)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Boot failure due to some interaction between per-port MSI-X and Intel RST
2017-09-04 6:16 ` Christoph Hellwig
@ 2017-09-05 15:32 ` John Loy
2017-09-05 15:46 ` Dan Williams
1 sibling, 0 replies; 4+ messages in thread
From: John Loy @ 2017-09-05 15:32 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-ide, Dan Williams
On 9/3/17 11:16 PM, Christoph Hellwig wrote:
> On Sun, Sep 03, 2017 at 06:42:35PM -0700, John Loy wrote:
>> I have a system that stopped booting Linux between kernel versions 4.4.9 and
>> 4.5.3. It has a SATA + NVMe accelerated volume that I use with Windows and
>> a separate SATA drive with my Linux installation. I'm not expecting the
>> remapped NVMe thing to be accessible, just the Linux disk, but none of the
>> drives are accessible.
>>
>> Bisecting the changes turned up d684a90 as the first failing change. Passing
>> pci=nomsi also allows the system to boot newer kernels. Just to be sure, I
>> built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag removed from the
>> per-port call to pci_alloc_irq_vectors in ahci_init_msi. This also allowed
>> the system to boot normally.
>>
>> I'm totally out of my depth though so I'd really appreciate it if anyone has
>> some ideas on how to proceed with a proper fix.
>
> Something like the patch below should work. Maybe Intel can provide
> an explanation on why their chipset is so fucked up that we can add
> as a comment.
>
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 5a5fd0b404eb..b8c8ecc854c4 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1470,6 +1470,7 @@ static void ahci_remap_check(struct pci_dev *pdev, int bar,
>
> dev_warn(&pdev->dev, "Found %d remapped NVMe devices.\n", count);
> dev_warn(&pdev->dev, "Switch your BIOS from RAID to AHCI mode to use them.\n");
> + hpriv->flags |= AHCI_HFLAG_NO_MSI;
> }
>
> static int ahci_get_irq_vector(struct ata_host *host, int port)
>
This patch works for me.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Boot failure due to some interaction between per-port MSI-X and Intel RST
2017-09-04 6:16 ` Christoph Hellwig
2017-09-05 15:32 ` John Loy
@ 2017-09-05 15:46 ` Dan Williams
1 sibling, 0 replies; 4+ messages in thread
From: Dan Williams @ 2017-09-05 15:46 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: John Loy, IDE/ATA development list
On Sun, Sep 3, 2017 at 11:16 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sun, Sep 03, 2017 at 06:42:35PM -0700, John Loy wrote:
>> I have a system that stopped booting Linux between kernel versions 4.4.9 and
>> 4.5.3. It has a SATA + NVMe accelerated volume that I use with Windows and
>> a separate SATA drive with my Linux installation. I'm not expecting the
>> remapped NVMe thing to be accessible, just the Linux disk, but none of the
>> drives are accessible.
>>
>> Bisecting the changes turned up d684a90 as the first failing change. Passing
>> pci=nomsi also allows the system to boot newer kernels. Just to be sure, I
>> built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag removed from the
>> per-port call to pci_alloc_irq_vectors in ahci_init_msi. This also allowed
>> the system to boot normally.
>>
>> I'm totally out of my depth though so I'd really appreciate it if anyone has
>> some ideas on how to proceed with a proper fix.
>
> Something like the patch below should work. Maybe Intel can provide
> an explanation on why their chipset is so fucked up that we can add
> as a comment.
>
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 5a5fd0b404eb..b8c8ecc854c4 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1470,6 +1470,7 @@ static void ahci_remap_check(struct pci_dev *pdev, int bar,
>
> dev_warn(&pdev->dev, "Found %d remapped NVMe devices.\n", count);
> dev_warn(&pdev->dev, "Switch your BIOS from RAID to AHCI mode to use them.\n");
> + hpriv->flags |= AHCI_HFLAG_NO_MSI;
> }
>
> static int ahci_get_irq_vector(struct ata_host *host, int port)
Yes, this patch looks good to me. As I said here [1]:
+ /*
+ * Don't rely on the msi-x capability in the remap case,
+ * share the legacy interrupt across ahci and remapped
+ * devices.
+ */
...we need to use pci-intx interrupts for both devices.
[1]: http://lists.infradead.org/pipermail/linux-nvme/2016-October/006801.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-05 15:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-04 1:42 Boot failure due to some interaction between per-port MSI-X and Intel RST John Loy
2017-09-04 6:16 ` Christoph Hellwig
2017-09-05 15:32 ` John Loy
2017-09-05 15:46 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox