* 3.2.57 regression: isci driver broken: Unable to reset I T nexus?
@ 2014-04-28 11:03 Ondrej Zary
[not found] ` <1398700228.97992.2.camel@djiang5-desk1.amr.corp.intel.com>
[not found] ` <57283945f737477b90e5ae31b9403799@fmsmsx156.amr.corp.intel.com>
0 siblings, 2 replies; 4+ messages in thread
From: Ondrej Zary @ 2014-04-28 11:03 UTC (permalink / raw)
To: Intel SCU Linux support; +Cc: linux-scsi, linux-kernel
Hello,
just upgraded a server running 3.2.54-2 to 3.2.57-3 (Debian Wheezy) and it
does not boot anymore because of isci driver breakage.
A (partial) log transcription:
sas: DOING DISCOVERY on port 0, pid:5
sas: Enter sas_scsi_recover_host
ata1: sas eh calling libata port error handler
sas: sas_ata_hard_reset: Unable to reset I T nexus?
sas: sas_ata_hard_reset: Found ATA device.
sas: sas_ata_hard_reset: Unable to soft reset
sas: sas_ata_hard_reset: Found ATA device.
ata1: reset failed (errno=-11), retrying in 10 secs
sas: sas_ata_hard_reset: Unable to reset I T nexus?
sas: sas_ata_hard_reset: Found ATA device.
sas: sas_ata_hard_reset: Unable to soft reset
sas: sas_ata_hard_reset: Found ATA device.
ata1: reset failed (errno=-11), retrying in 35 secs
ata1: reset failed, giving up
sas: --- Exit sas_scsi_recover_host
sas: DONE DISCOVERY on port 0, pid: 5, result:0
sas: phy-0:1 added to port-0:1, phy_mask:0x2 (5fcfffff00000002)
sas: DOING DISCOVERY on port 1, pid:5
sas: Enter sas_scsi_recover_host
ata1: sas eh calling libata port error handler
sas: sas_ata_hard_reset: Unable to reset I T nexus?
sas: sas_ata_hard_reset: Found ATA device.
sas: sas_ata_hard_reset: Unable to soft reset
sas: sas_ata_hard_reset: Found ATA device.
ata2: reset failed (errno=-11), retrying in 10 secs
sas: sas_ata_hard_reset: Unable to reset I T nexus?
sas: sas_ata_hard_reset: Found ATA device.
sas: sas_ata_hard_reset: Unable to soft reset
sas: sas_ata_hard_reset: Found ATA device.
ata2: reset failed (errno=-11), retrying in 35 secs
ata2: reset failed, giving up
It should look like this (v3.2.54-2):
isci: Intel(R) C600 SAS Controller Driver - version 1.0.0
isci 0000:03:00.0: driver configured for rev: 6 silicon
isci 0000:03:00.0: firmware: agent loaded isci/isci_firmware.bin into memory
isci 0000:03:00.0: OEM SAS parameters (version: 1.3) loaded (firmware)
isci 0000:03:00.0: setting latency timer to 64
scsi0 : isci
scsi1 : isci
isci 0000:03:00.0: irq 81 for MSI/MSI-X
isci 0000:03:00.0: irq 82 for MSI/MSI-X
isci 0000:03:00.0: irq 83 for MSI/MSI-X
isci 0000:03:00.0: irq 84 for MSI/MSI-X
sas: phy-0:0 added to port-0:0, phy_mask:0x1 (5fcfffff00000001)
sas: DOING DISCOVERY on port 0, pid:5
sas: Enter sas_scsi_recover_host
ata1: sas eh calling libata port error handler
sas: sas_ata_hard_reset: Found ATA device.
ata1.00: ATA-8: ST9500620NS, CC02, max UDMA/133
ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
sas: --- Exit sas_scsi_recover_host
scsi 0:0:0:0: Direct-Access ATA ST9500620NS CC02 PQ: 0 ANSI: 5
sas: DONE DISCOVERY on port 0, pid:5, result:0
sas: phy-0:1 added to port-0:1, phy_mask:0x2 (5fcfffff00000002)
sas: DOING DISCOVERY on port 1, pid:5
sas: Enter sas_scsi_recover_host
ata1: sas eh calling libata port error handler
ata2: sas eh calling libata port error handler
sas: sas_ata_hard_reset: Found ATA device.
ata2.00: ATA-8: ST9500620NS, CC02, max UDMA/133
ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
sas: --- Exit sas_scsi_recover_host
scsi 0:0:1:0: Direct-Access ATA ST9500620NS CC02 PQ: 0 ANSI: 5
sas: DONE DISCOVERY on port 1, pid:5, result:0
--
Ondrej Zary
^ permalink raw reply [flat|nested] 4+ messages in thread[parent not found: <1398700228.97992.2.camel@djiang5-desk1.amr.corp.intel.com>]
* Re: 3.2.57 regression: isci driver broken: Unable to reset I T nexus? [not found] ` <1398700228.97992.2.camel@djiang5-desk1.amr.corp.intel.com> @ 2014-04-28 16:28 ` Ondrej Zary 0 siblings, 0 replies; 4+ messages in thread From: Ondrej Zary @ 2014-04-28 16:28 UTC (permalink / raw) To: Jiang, Dave; +Cc: Dan Williams, intel-linux-scu, linux-scsi, linux-kernel On Monday 28 April 2014 17:50:29 Jiang, Dave wrote: > On Mon, 2014-04-28 at 13:03 +0200, Ondrej Zary wrote: > > Hello, > > just upgraded a server running 3.2.54-2 to 3.2.57-3 (Debian Wheezy) and > > it does not boot anymore because of isci driver breakage. > > I would not run anything less than 3.8 for the isci controller. 3.2 is > VERY old for that particular driver and likely very unstable. The > product version of that driver plus libsas started with 3.8. Also I'm > concerned that you aren't using the platform OEM parameters. You need to > turn your OROM or EFI driver on for the SAS controller. It's a Cisco UCS C22 M3 server with a crappy LSI fakeraid that cannot even be disabled. It was a pain to make it boot properly - had to use dmraid. But it has been working fine since then (2012). Until now. I guess that it could be caused by the following commit but haven't tested it: commit 584ec12265192bf49dfa270d517380f6723a6956 Author: Dan Williams <dan.j.williams@intel.com> Date: Thu Feb 6 12:23:01 2014 -0800 > > A (partial) log transcription: > > sas: DOING DISCOVERY on port 0, pid:5 > > sas: Enter sas_scsi_recover_host > > ata1: sas eh calling libata port error handler > > sas: sas_ata_hard_reset: Unable to reset I T nexus? > > sas: sas_ata_hard_reset: Found ATA device. > > sas: sas_ata_hard_reset: Unable to soft reset > > sas: sas_ata_hard_reset: Found ATA device. > > ata1: reset failed (errno=-11), retrying in 10 secs > > sas: sas_ata_hard_reset: Unable to reset I T nexus? > > sas: sas_ata_hard_reset: Found ATA device. > > sas: sas_ata_hard_reset: Unable to soft reset > > sas: sas_ata_hard_reset: Found ATA device. > > ata1: reset failed (errno=-11), retrying in 35 secs > > ata1: reset failed, giving up > > sas: --- Exit sas_scsi_recover_host > > sas: DONE DISCOVERY on port 0, pid: 5, result:0 > > sas: phy-0:1 added to port-0:1, phy_mask:0x2 (5fcfffff00000002) > > sas: DOING DISCOVERY on port 1, pid:5 > > sas: Enter sas_scsi_recover_host > > ata1: sas eh calling libata port error handler > > sas: sas_ata_hard_reset: Unable to reset I T nexus? > > sas: sas_ata_hard_reset: Found ATA device. > > sas: sas_ata_hard_reset: Unable to soft reset > > sas: sas_ata_hard_reset: Found ATA device. > > ata2: reset failed (errno=-11), retrying in 10 secs > > sas: sas_ata_hard_reset: Unable to reset I T nexus? > > sas: sas_ata_hard_reset: Found ATA device. > > sas: sas_ata_hard_reset: Unable to soft reset > > sas: sas_ata_hard_reset: Found ATA device. > > ata2: reset failed (errno=-11), retrying in 35 secs > > ata2: reset failed, giving up > > > > > > It should look like this (v3.2.54-2): > > isci: Intel(R) C600 SAS Controller Driver - version 1.0.0 > > isci 0000:03:00.0: driver configured for rev: 6 silicon > > isci 0000:03:00.0: firmware: agent loaded isci/isci_firmware.bin into > > memory isci 0000:03:00.0: OEM SAS parameters (version: 1.3) loaded > > (firmware) isci 0000:03:00.0: setting latency timer to 64 > > scsi0 : isci > > scsi1 : isci > > isci 0000:03:00.0: irq 81 for MSI/MSI-X > > isci 0000:03:00.0: irq 82 for MSI/MSI-X > > isci 0000:03:00.0: irq 83 for MSI/MSI-X > > isci 0000:03:00.0: irq 84 for MSI/MSI-X > > sas: phy-0:0 added to port-0:0, phy_mask:0x1 (5fcfffff00000001) > > sas: DOING DISCOVERY on port 0, pid:5 > > sas: Enter sas_scsi_recover_host > > ata1: sas eh calling libata port error handler > > sas: sas_ata_hard_reset: Found ATA device. > > ata1.00: ATA-8: ST9500620NS, CC02, max UDMA/133 > > ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) > > ata1.00: configured for UDMA/133 > > sas: --- Exit sas_scsi_recover_host > > scsi 0:0:0:0: Direct-Access ATA ST9500620NS CC02 PQ: 0 > > ANSI: 5 sas: DONE DISCOVERY on port 0, pid:5, result:0 > > sas: phy-0:1 added to port-0:1, phy_mask:0x2 (5fcfffff00000002) > > sas: DOING DISCOVERY on port 1, pid:5 > > sas: Enter sas_scsi_recover_host > > ata1: sas eh calling libata port error handler > > ata2: sas eh calling libata port error handler > > sas: sas_ata_hard_reset: Found ATA device. > > ata2.00: ATA-8: ST9500620NS, CC02, max UDMA/133 > > ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32) > > ata2.00: configured for UDMA/133 > > sas: --- Exit sas_scsi_recover_host > > scsi 0:0:1:0: Direct-Access ATA ST9500620NS CC02 PQ: 0 > > ANSI: 5 sas: DONE DISCOVERY on port 1, pid:5, result:0 -- Ondrej Zary ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <57283945f737477b90e5ae31b9403799@fmsmsx156.amr.corp.intel.com>]
[parent not found: <1398703903.97992.10.camel@djiang5-desk1.amr.corp.intel.com>]
[parent not found: <201404281922.19399.linux@rainbow-software.org>]
* Re: 3.2.57 regression: isci driver broken: Unable to reset I T nexus? [not found] ` <201404281922.19399.linux@rainbow-software.org> @ 2014-04-28 19:24 ` Dan Williams 2014-04-30 12:30 ` Ben Hutchings 0 siblings, 1 reply; 4+ messages in thread From: Dan Williams @ 2014-04-28 19:24 UTC (permalink / raw) To: Ondrej Zary Cc: Jiang, Dave, Paszkiewicz, Artur, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Hutchings [ adding Ben ] On Mon, Apr 28, 2014 at 10:22 AM, Ondrej Zary <linux@rainbow-software.org> wrote: > On Monday 28 April 2014 18:51:44 Jiang, Dave wrote: >> On Mon, 2014-04-28 at 16:28 +0000, Ondrej Zary wrote: >> > On Monday 28 April 2014 17:50:29 Jiang, Dave wrote: >> > > On Mon, 2014-04-28 at 13:03 +0200, Ondrej Zary wrote: >> > > > Hello, >> > > > just upgraded a server running 3.2.54-2 to 3.2.57-3 (Debian Wheezy) >> > > > and it does not boot anymore because of isci driver breakage. >> > > >> > > I would not run anything less than 3.8 for the isci controller. 3.2 is >> > > VERY old for that particular driver and likely very unstable. The >> > > product version of that driver plus libsas started with 3.8. Also I'm >> > > concerned that you aren't using the platform OEM parameters. You need >> > > to turn your OROM or EFI driver on for the SAS controller. >> > >> > It's a Cisco UCS C22 M3 server with a crappy LSI fakeraid that cannot >> > even be disabled. It was a pain to make it boot properly - had to use >> > dmraid. But it has been working fine since then (2012). Until now. >> >> Yes but just because it has been working doesn't mean it is a good idea >> to run unstable code.... You need the driver updates and the libsas >> updates for it to function properly. Does this fail on 3.14? If it is >> that patch I have a feeling it may be interacting badly with whatever is >> was in 3.2 libsas that may not be a problem with latest kernels.... It >> is odd to see all those hard resets however.... Did you have them when >> it was working for you? > > Didn't know that it was unstable - it worked with no problems, better than > some products marked as stable :) > 3.13 works fine - I've installed it from wheezy-backports to work-around the > bug. > > The log from working 3.2.54 is below (at the end) - there's one reset for each > port. > I think the right answer for 3.2 is to drop commit 584ec1226519 "isci: fix reset timeout handling". libsas and its libata interaction went through significant overhaul after 3.2 so it's not surprising that a change to reset handling regresses like this. Ideally there would be a backport of latest libsas available for 3.2, but no one to my knowledge is working on that. -- Dan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 3.2.57 regression: isci driver broken: Unable to reset I T nexus? 2014-04-28 19:24 ` Dan Williams @ 2014-04-30 12:30 ` Ben Hutchings 0 siblings, 0 replies; 4+ messages in thread From: Ben Hutchings @ 2014-04-30 12:30 UTC (permalink / raw) To: Dan Williams Cc: Ondrej Zary, Jiang, Dave, Paszkiewicz, Artur, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, stable [-- Attachment #1: Type: text/plain, Size: 1632 bytes --] I'm adding this revert to 3.2.58, taking your 'drop commit 584ec1226519' as an ack. Ben. --- From: Ben Hutchings <ben@decadent.org.uk> Date: Wed, 30 Apr 2014 13:22:22 +0100 Subject: Revert "isci: fix reset timeout handling" This reverts commit 584ec12265192bf49dfa270d517380f6723a6956, which was commit ddfadd7736b677de2d4ca2cd5b4b655368c85a7a upstream. It causes boot failure on 3.2 although no such problem occurs upstream. Reported-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Dan Williams <dan.j.williams@intel.com> --- --- a/drivers/scsi/isci/port_config.c +++ b/drivers/scsi/isci/port_config.c @@ -610,6 +610,13 @@ static void sci_apc_agent_link_up(struct sci_apc_agent_configure_ports(ihost, port_agent, iphy, true); } else { /* the phy is already the part of the port */ + u32 port_state = iport->sm.current_state_id; + + /* if the PORT'S state is resetting then the link up is from + * port hard reset in this case, we need to tell the port + * that link up is recieved + */ + BUG_ON(port_state != SCI_PORT_RESETTING); port_agent->phy_ready_mask |= 1 << phy_index; sci_port_link_up(iport, iphy); } --- a/drivers/scsi/isci/task.c +++ b/drivers/scsi/isci/task.c @@ -1390,7 +1390,7 @@ int isci_task_I_T_nexus_reset(struct dom spin_unlock_irqrestore(&ihost->scic_lock, flags); if (!idev || !test_bit(IDEV_EH, &idev->flags)) { - ret = -ENODEV; + ret = TMF_RESP_FUNC_COMPLETE; goto out; } -- Ben Hutchings Life would be so much easier if we could look at the source code. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-04-30 12:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-28 11:03 3.2.57 regression: isci driver broken: Unable to reset I T nexus? Ondrej Zary
[not found] ` <1398700228.97992.2.camel@djiang5-desk1.amr.corp.intel.com>
2014-04-28 16:28 ` Ondrej Zary
[not found] ` <57283945f737477b90e5ae31b9403799@fmsmsx156.amr.corp.intel.com>
[not found] ` <1398703903.97992.10.camel@djiang5-desk1.amr.corp.intel.com>
[not found] ` <201404281922.19399.linux@rainbow-software.org>
2014-04-28 19:24 ` Dan Williams
2014-04-30 12:30 ` Ben Hutchings
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox