public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
* Boot failure due to some interaction between per-port MSI-X and Intel RST
@ 2017-09-04  1:42 John Loy
  2017-09-04  6:16 ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: John Loy @ 2017-09-04  1:42 UTC (permalink / raw)
  To: linux-ide

I have a system that stopped booting Linux between kernel versions 4.4.9 
and 4.5.3.  It has a SATA + NVMe accelerated volume that I use with 
Windows and a separate SATA drive with my Linux installation.  I'm not 
expecting the remapped NVMe thing to be accessible, just the Linux disk, 
but none of the drives are accessible.

Bisecting the changes turned up d684a90 as the first failing change. 
Passing pci=nomsi also allows the system to boot newer kernels.  Just to 
be sure, I built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag 
removed from the per-port call to pci_alloc_irq_vectors in 
ahci_init_msi.  This also allowed the system to boot normally.

I'm totally out of my depth though so I'd really appreciate it if anyone 
has some ideas on how to proceed with a proper fix.

Thanks,
John

I can post the entire dmesg output if it would be helpful but the 
following seems like the most relevant parts:

[    0.516776] libata version 3.00 loaded.
[    2.866275] ahci 0000:00:17.0: version 3.0
[    2.866331] ahci 0000:00:17.0: Found 1 remapped NVMe devices.
[    2.866331] ahci 0000:00:17.0: Switch your BIOS from RAID to AHCI 
mode to use them.
[    2.866335] ahci 0000:00:17.0: controller can't do SNTF, turning off 
CAP_SNTF
[    2.876533] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 6 ports 6 Gbps 
0x3f impl RAID mode
[    2.876534] ahci 0000:00:17.0: flags: 64bit ncq led clo only pio slum 
part deso sadm sds apst
[    2.876538] ahci 0000:00:17.0: both AHCI_HFLAG_MULTI_MSI flag set and 
custom irq handler implemented
[    2.877032] scsi host0: ahci
[    2.877178] scsi host1: ahci
[    2.877342] scsi host2: ahci
[    2.877483] scsi host3: ahci
[    2.877647] scsi host4: ahci
[    2.877831] scsi host5: ahci
[    2.877845] ata1: SATA max UDMA/133 abar m524288@0xdf200000 port 
0xdf200100 irq 121
[    2.877846] ata2: SATA max UDMA/133 abar m524288@0xdf200000 port 
0xdf200180 irq 122
[    2.877847] ata3: SATA max UDMA/133 abar m524288@0xdf200000 port 
0xdf200200 irq 123
[    2.877848] ata4: SATA max UDMA/133 abar m524288@0xdf200000 port 
0xdf200280 irq 124
[    2.877849] ata5: SATA max UDMA/133 abar m524288@0xdf200000 port 
0xdf200300 irq 125
[    2.877850] ata6: SATA max UDMA/133 abar m524288@0xdf200000 port 
0xdf200380 irq 126
[    3.186836] ata6: SATA link down (SStatus 4 SControl 300)
[    3.186882] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    3.186919] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.186970] ata2: SATA link down (SStatus 4 SControl 300)
[    3.187003] ata1: SATA link down (SStatus 4 SControl 300)
[    3.187042] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    8.676029] ata5.00: qc timeout (cmd 0xa1)
[    8.676037] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    8.676054] ata4.00: qc timeout (cmd 0xec)
[    8.676062] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    8.676079] ata3.00: qc timeout (cmd 0xec)
[    8.676087] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    8.986227] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    8.986282] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    8.986309] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   19.428027] ata4.00: qc timeout (cmd 0xec)
[   19.428035] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   19.428039] ata4: limiting SATA link speed to 1.5 Gbps
[   19.428056] ata3.00: qc timeout (cmd 0xec)
[   19.428064] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   19.428068] ata3: limiting SATA link speed to 3.0 Gbps
[   19.428086] ata5.00: qc timeout (cmd 0xa1)
[   19.428094] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   19.428097] ata5: limiting SATA link speed to 1.5 Gbps
[   19.738226] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   19.738251] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   19.738276] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   50.148029] ata5.00: qc timeout (cmd 0xa1)
[   50.148037] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   50.148054] ata4.00: qc timeout (cmd 0xec)
[   50.148062] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   50.148078] ata3.00: qc timeout (cmd 0xec)
[   50.148086] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   50.458231] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   50.458260] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   50.458289] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Boot failure due to some interaction between per-port MSI-X and Intel RST
  2017-09-04  1:42 Boot failure due to some interaction between per-port MSI-X and Intel RST John Loy
@ 2017-09-04  6:16 ` Christoph Hellwig
  2017-09-05 15:32   ` John Loy
  2017-09-05 15:46   ` Dan Williams
  0 siblings, 2 replies; 4+ messages in thread
From: Christoph Hellwig @ 2017-09-04  6:16 UTC (permalink / raw)
  To: John Loy; +Cc: linux-ide, Dan Williams

On Sun, Sep 03, 2017 at 06:42:35PM -0700, John Loy wrote:
> I have a system that stopped booting Linux between kernel versions 4.4.9 and
> 4.5.3.  It has a SATA + NVMe accelerated volume that I use with Windows and
> a separate SATA drive with my Linux installation.  I'm not expecting the
> remapped NVMe thing to be accessible, just the Linux disk, but none of the
> drives are accessible.
> 
> Bisecting the changes turned up d684a90 as the first failing change. Passing
> pci=nomsi also allows the system to boot newer kernels.  Just to be sure, I
> built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag removed from the
> per-port call to pci_alloc_irq_vectors in ahci_init_msi.  This also allowed
> the system to boot normally.
> 
> I'm totally out of my depth though so I'd really appreciate it if anyone has
> some ideas on how to proceed with a proper fix.

Something like the patch below should work.  Maybe Intel can provide
an explanation on why their chipset is so fucked up that we can add
as a comment.

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 5a5fd0b404eb..b8c8ecc854c4 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1470,6 +1470,7 @@ static void ahci_remap_check(struct pci_dev *pdev, int bar,
 
 	dev_warn(&pdev->dev, "Found %d remapped NVMe devices.\n", count);
 	dev_warn(&pdev->dev, "Switch your BIOS from RAID to AHCI mode to use them.\n");
+	hpriv->flags |= AHCI_HFLAG_NO_MSI;
 }
 
 static int ahci_get_irq_vector(struct ata_host *host, int port)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Boot failure due to some interaction between per-port MSI-X and Intel RST
  2017-09-04  6:16 ` Christoph Hellwig
@ 2017-09-05 15:32   ` John Loy
  2017-09-05 15:46   ` Dan Williams
  1 sibling, 0 replies; 4+ messages in thread
From: John Loy @ 2017-09-05 15:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-ide, Dan Williams

On 9/3/17 11:16 PM, Christoph Hellwig wrote:
> On Sun, Sep 03, 2017 at 06:42:35PM -0700, John Loy wrote:
>> I have a system that stopped booting Linux between kernel versions 4.4.9 and
>> 4.5.3.  It has a SATA + NVMe accelerated volume that I use with Windows and
>> a separate SATA drive with my Linux installation.  I'm not expecting the
>> remapped NVMe thing to be accessible, just the Linux disk, but none of the
>> drives are accessible.
>>
>> Bisecting the changes turned up d684a90 as the first failing change. Passing
>> pci=nomsi also allows the system to boot newer kernels.  Just to be sure, I
>> built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag removed from the
>> per-port call to pci_alloc_irq_vectors in ahci_init_msi.  This also allowed
>> the system to boot normally.
>>
>> I'm totally out of my depth though so I'd really appreciate it if anyone has
>> some ideas on how to proceed with a proper fix.
> 
> Something like the patch below should work.  Maybe Intel can provide
> an explanation on why their chipset is so fucked up that we can add
> as a comment.
> 
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 5a5fd0b404eb..b8c8ecc854c4 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1470,6 +1470,7 @@ static void ahci_remap_check(struct pci_dev *pdev, int bar,
>   
>   	dev_warn(&pdev->dev, "Found %d remapped NVMe devices.\n", count);
>   	dev_warn(&pdev->dev, "Switch your BIOS from RAID to AHCI mode to use them.\n");
> +	hpriv->flags |= AHCI_HFLAG_NO_MSI;
>   }
>   
>   static int ahci_get_irq_vector(struct ata_host *host, int port)
> 

This patch works for me.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Boot failure due to some interaction between per-port MSI-X and Intel RST
  2017-09-04  6:16 ` Christoph Hellwig
  2017-09-05 15:32   ` John Loy
@ 2017-09-05 15:46   ` Dan Williams
  1 sibling, 0 replies; 4+ messages in thread
From: Dan Williams @ 2017-09-05 15:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: John Loy, IDE/ATA development list

On Sun, Sep 3, 2017 at 11:16 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sun, Sep 03, 2017 at 06:42:35PM -0700, John Loy wrote:
>> I have a system that stopped booting Linux between kernel versions 4.4.9 and
>> 4.5.3.  It has a SATA + NVMe accelerated volume that I use with Windows and
>> a separate SATA drive with my Linux installation.  I'm not expecting the
>> remapped NVMe thing to be accessible, just the Linux disk, but none of the
>> drives are accessible.
>>
>> Bisecting the changes turned up d684a90 as the first failing change. Passing
>> pci=nomsi also allows the system to boot newer kernels.  Just to be sure, I
>> built a recent kernel (4.12.9) with the PCI_IRQ_MSIX flag removed from the
>> per-port call to pci_alloc_irq_vectors in ahci_init_msi.  This also allowed
>> the system to boot normally.
>>
>> I'm totally out of my depth though so I'd really appreciate it if anyone has
>> some ideas on how to proceed with a proper fix.
>
> Something like the patch below should work.  Maybe Intel can provide
> an explanation on why their chipset is so fucked up that we can add
> as a comment.
>
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 5a5fd0b404eb..b8c8ecc854c4 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1470,6 +1470,7 @@ static void ahci_remap_check(struct pci_dev *pdev, int bar,
>
>         dev_warn(&pdev->dev, "Found %d remapped NVMe devices.\n", count);
>         dev_warn(&pdev->dev, "Switch your BIOS from RAID to AHCI mode to use them.\n");
> +       hpriv->flags |= AHCI_HFLAG_NO_MSI;
>  }
>
>  static int ahci_get_irq_vector(struct ata_host *host, int port)

Yes, this patch looks good to me.  As I said here [1]:

+               /*
+                * Don't rely on the msi-x capability in the remap case,
+                * share the legacy interrupt across ahci and remapped
+                * devices.
+                */

...we need to use pci-intx interrupts for both devices.

[1]: http://lists.infradead.org/pipermail/linux-nvme/2016-October/006801.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-09-05 15:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-04  1:42 Boot failure due to some interaction between per-port MSI-X and Intel RST John Loy
2017-09-04  6:16 ` Christoph Hellwig
2017-09-05 15:32   ` John Loy
2017-09-05 15:46   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox