* Error handling on FC devices @ 2012-11-19 12:41 Hannes Reinecke 2012-11-26 22:32 ` James Smart 0 siblings, 1 reply; 12+ messages in thread From: Hannes Reinecke @ 2012-11-19 12:41 UTC (permalink / raw) To: SCSI Mailing List Cc: Andrew Vasquez, Chad Dupuis, James Smart, James Bottomley Hi all, just when we thought we'd finally nailed the error handling on FC ... A customer of ours recently hit this really nasty issue: He had a 'drain' on the SAN, in the sense that the link was still intact, but no commands were coming back from the link. This caused the FC HBA / driver to not detect a link down, and so the failing command was pushed onto the error handler. Which of course resorted back to HBA reset, but by that time the cluster already had kicked out the machine. And as all machines in the cluster were connected to the same switch this happened to all machines, resulting on a nice cluster shutdown. And a really unhappy customer. Looking closer multipathing actually managed to detect and switch paths as desired, but as the initial failing command was pushed onto the error handler all applications had to wait for this command to finish before proceeding. So the following questions: - Why did the FC HBA not detect a 'link-down' scenario? (Incidentally, this happens with QLogic _and_ Emulex :-) I know this is not a typical link-down, but from my naive assumption the HBA should detect that commands are not making progress, and at least after RA TOV was expired it should try to reset the link. - Can we speed up error handling for these cases? Currently we're waiting for eh to complete before returning the affected commands with a final state. However, after we've done a LUN reset there shouldn't be any command state left and we should be able to terminate outstanding commands directly, without having to wait for eh to finally complete. James? Thanks. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-11-19 12:41 Error handling on FC devices Hannes Reinecke @ 2012-11-26 22:32 ` James Smart 2012-11-27 20:03 ` Ewan Milne 0 siblings, 1 reply; 12+ messages in thread From: James Smart @ 2012-11-26 22:32 UTC (permalink / raw) To: Hannes Reinecke Cc: SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley On 11/19/2012 7:41 AM, Hannes Reinecke wrote: > Hi all, > > just when we thought we'd finally nailed the error handling on FC ... > A customer of ours recently hit this really nasty issue: > He had a 'drain' on the SAN, in the sense that the link was still > intact, but no commands were coming back from the link. > > This caused the FC HBA / driver to not detect a link down, and so the > failing command was pushed onto the error handler. > Which of course resorted back to HBA reset, but by that time the > cluster already had kicked out the machine. > And as all machines in the cluster were connected to the same switch > this happened to all machines, resulting on a nice cluster shutdown. > And a really unhappy customer. > > Looking closer multipathing actually managed to detect and switch > paths as desired, but as the initial failing command was pushed onto > the error handler all applications had to wait for this command to > finish before proceeding. > > So the following questions: > - Why did the FC HBA not detect a 'link-down' scenario? > (Incidentally, this happens with QLogic _and_ Emulex :-) > I know this is not a typical link-down, but from my naive > assumption the HBA should detect that commands are not > making progress, and at least after RA TOV was expired > it should try to reset the link. Link up/down is almost always the state of the physical link - TX signal is active, and on RX side, we have negotiated speed and acquired sync, and are seeing valid characters. It has nothing to do with the packet transmission on the link which is a different story. There is, within the FC std, tracking of credits vs the link, which could reset it (although, it's reset, and yours may be a different definition). So as long as the other end kept it's link up, and we saw valid characters - the link is fine. From the SCSI perspective - there are no requirements about how long a command takes (consider format commands - which could take hours between the cmd and the response). There is no definition about "making progress" that can be enforced. We have the i/o timers - which usually have defaults of 30s/60s/90s by default. R_A_TOV (10s) is too short vs these - especially when considering some transparent failover arrays (2 pieces of hardware, both on the link, but only 1 responding - and after one fails, the other takes over the others personality, taking about 90s to do so, and resuming the i/os from the new hardware; during this entire time there may be no traffic, for much of this window and it's still "good"). Additionally, there is no requirement that all targets be in use at all times - you could come up with a situation where 1 target is influencing the link activity decision, thus invoking the link bounce, and disrupting i/o load on other targets that are fine. Low probability, but possible. In general, lack of activity is a good indicator, but that's it, only an indicator. Not great for a hard policy "choice". Also, you're asking low-level designs to now do something new (time inter-i/o gaps, and aggregate gaps), which they may not be prepared to do so. > - Can we speed up error handling for these cases? > Currently we're waiting for eh to complete before returning > the affected commands with a final state. > However, after we've done a LUN reset there shouldn't be > any command state left and we should be able to terminate > outstanding commands directly, without having to wait for > eh to finally complete. James? Theoretically, I agree - the affected command only has to stall long enough to ensure its own cancellation, which could be just the io abort. True, if the abort is not successful, then you still don't know the status, so you have to escalate the type of recovery to try to cancel, etc. I expect, given the limbo state of the i/o on lower eh failures, you do have to wait to ensure it's "cancelled", at least from a generic scsi point of view. You could try to optimize the local system view - where as long as the LLDD ensures it's cancelled, and will protocol-wise ensure no bad side effects, then you could release it earlier in the eh escalation. I don't believe we have a way for the LLDD to give such a notice to the midlayer. Given all the grey areas you touch on, especially across different types of scsi protocols and hardware, it doesn't surprise me we are waiting until we have confirmation of cancellation before continuing. Given path switching is somewhat separate from the i/o, would it better to send a notification of a path-fail condition as part of the eh, rather than hinging it on the individual i/o. Yes, the i/o is still in limbo and can't be switched to the new path, but other i/o could without incurring the delay. -- james s > > Thanks. > > Cheers, > > Hannes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-11-26 22:32 ` James Smart @ 2012-11-27 20:03 ` Ewan Milne 2012-11-27 20:29 ` Elliott, Robert (Server Storage) 0 siblings, 1 reply; 12+ messages in thread From: Ewan Milne @ 2012-11-27 20:03 UTC (permalink / raw) To: James.Smart Cc: Hannes Reinecke, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley On Mon, 2012-11-26 at 17:32 -0500, James Smart wrote: > Given path switching is somewhat separate from the i/o, would it better > to send a notification of a path-fail condition as part of the eh, > rather than hinging it on the individual i/o. Yes, the i/o is still in > limbo and can't be switched to the new path, but other i/o could without > incurring the delay. Is there a potential issue with a write that is taking a long time on one path, which could cause path switching for subsequent writes to another path, before the disposition of the first write is known? ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Error handling on FC devices 2012-11-27 20:03 ` Ewan Milne @ 2012-11-27 20:29 ` Elliott, Robert (Server Storage) 2012-11-28 7:09 ` Hannes Reinecke 0 siblings, 1 reply; 12+ messages in thread From: Elliott, Robert (Server Storage) @ 2012-11-27 20:29 UTC (permalink / raw) To: emilne@redhat.com, James.Smart@emulex.com Cc: Hannes Reinecke, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley There is a new command in SPC-4 called REMOVE I_T NEXUS that is intended to help that situation. REMOVE I_T NEXUS lets the application client use a good I_T nexus to abort commands that were being processed on a bad I_T nexus, so it can safely re-issue those commands on the good I_T nexus without worrying that the original commands might resume. In contrast: - the ABORT TASK, ABORT TASK SET, and CLEAR TASK SET must use the same I_T nexus as the commands being aborted, so are not viable if that I_T nexus is bad - LOGICAL UNIT RESET aborts commands from every I_T nexus, so in addition to aborting commands from the bad I_T nexus it also affects commands on the good I_T nexus > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of Ewan Milne > Sent: Tuesday, 27 November, 2012 2:03 PM > To: James.Smart@emulex.com > Cc: Hannes Reinecke; SCSI Mailing List; Andrew Vasquez; Chad Dupuis; > James Bottomley > Subject: Re: Error handling on FC devices > > On Mon, 2012-11-26 at 17:32 -0500, James Smart wrote: > > Given path switching is somewhat separate from the i/o, would it better > > to send a notification of a path-fail condition as part of the eh, > > rather than hinging it on the individual i/o. Yes, the i/o is still in > > limbo and can't be switched to the new path, but other i/o could without > > incurring the delay. > > Is there a potential issue with a write that is taking a long time on > one path, which could cause path switching for subsequent writes to > another path, before the disposition of the first write is known? > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-11-27 20:29 ` Elliott, Robert (Server Storage) @ 2012-11-28 7:09 ` Hannes Reinecke 2012-11-29 16:02 ` James Smart 0 siblings, 1 reply; 12+ messages in thread From: Hannes Reinecke @ 2012-11-28 7:09 UTC (permalink / raw) To: Elliott, Robert (Server Storage) Cc: emilne@redhat.com, James.Smart@emulex.com, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley On 11/27/2012 09:29 PM, Elliott, Robert (Server Storage) wrote: > There is a new command in SPC-4 called REMOVE I_T NEXUS that is intended to help > that situation. REMOVE I_T NEXUS lets the application client use a good I_T nexus > to abort commands that were being processed on a bad I_T nexus, so it can safely > re-issue those commands on the good I_T nexus without worrying that the original > commands might resume. > > In contrast: > - the ABORT TASK, ABORT TASK SET, and CLEAR TASK SET must use the same I_T nexus > as the commands being aborted, so are not viable if that I_T nexus is bad > - LOGICAL UNIT RESET aborts commands from every I_T nexus, so in addition to > aborting commands from the bad I_T nexus it also affects commands on the > good I_T nexus > Hmm. Nice in principle, but the problem here is that we cannot guarantee the nexus is still intact. So one would need to implement this in the HBA firmware; the firmware would need to be able to process the TMF, and do the appropriate things for the FC stack like dropping the rport etc. Good idea, though. James, Andrew, Chad? Any chance of having a firmware supporting REMOVE IT-NEXUS ? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-11-28 7:09 ` Hannes Reinecke @ 2012-11-29 16:02 ` James Smart 2012-11-30 11:44 ` Hannes Reinecke 0 siblings, 1 reply; 12+ messages in thread From: James Smart @ 2012-11-29 16:02 UTC (permalink / raw) To: Hannes Reinecke Cc: Elliott, Robert (Server Storage), emilne@redhat.com, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley Always possible - but.... Our f/w works at the FCP level and below, which means it doesn't know/do SCSI commands - e.g what the cdb within the FCP CMD frame is; know anything about SCSI device classes and state; etc. And it shouldn't be required to do so. Anytime this has been there in the past, it's been problematic. if we want to do this - we should add it to the midlayer/transport. -- james s On 11/28/2012 2:09 AM, Hannes Reinecke wrote: > On 11/27/2012 09:29 PM, Elliott, Robert (Server Storage) wrote: >> There is a new command in SPC-4 called REMOVE I_T NEXUS that is >> intended to help > > that situation. REMOVE I_T NEXUS lets the application client use a > good I_T nexus > > to abort commands that were being processed on a bad I_T nexus, so > it can safely > > re-issue those commands on the good I_T nexus without worrying that > the original > > commands might resume. >> >> In contrast: >> - the ABORT TASK, ABORT TASK SET, and CLEAR TASK SET must use the >> same I_T nexus > > as the commands being aborted, so are not viable if that I_T nexus > is bad >> - LOGICAL UNIT RESET aborts commands from every I_T nexus, so in >> addition to > > aborting commands from the bad I_T nexus it also affects commands > on the > > good I_T nexus >> > Hmm. Nice in principle, but the problem here is that we cannot > guarantee the nexus is still intact. > So one would need to implement this in the HBA firmware; the firmware > would need to be able to process the TMF, and do the appropriate > things for the FC stack like dropping the rport etc. > > Good idea, though. > > James, Andrew, Chad? > Any chance of having a firmware supporting REMOVE IT-NEXUS ? > > Cheers, > > Hannes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-11-29 16:02 ` James Smart @ 2012-11-30 11:44 ` Hannes Reinecke 2012-11-30 16:54 ` Mike Christie 0 siblings, 1 reply; 12+ messages in thread From: Hannes Reinecke @ 2012-11-30 11:44 UTC (permalink / raw) To: James.Smart Cc: Elliott, Robert (Server Storage), emilne@redhat.com, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley On 11/29/2012 05:02 PM, James Smart wrote: > Always possible - but.... Our f/w works at the FCP level and > below, which means it doesn't know/do SCSI commands - e.g what the > cdb within the FCP CMD frame is; know anything about SCSI device > classes and state; etc. And it shouldn't be required to do so. > Anytime this has been there in the past, it's been problematic. > > if we want to do this - we should add it to the midlayer/transport. > D'accord. Transport layer looks like a good fit. What we should be doing is hooking up 'bus_reset' to be equivalent to REMOVE I_T NEXUS (SAS is already doing this). In our case a REMOVE I_T NEXUS would be roughly equivalent to scsi_remote_port_delete(); only we should be starting aborting outstanding I/O directly and not waiting for fast_fail_tmo to kick in. I'll be posting a patch. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-11-30 11:44 ` Hannes Reinecke @ 2012-11-30 16:54 ` Mike Christie 2012-12-03 7:15 ` Hannes Reinecke 0 siblings, 1 reply; 12+ messages in thread From: Mike Christie @ 2012-11-30 16:54 UTC (permalink / raw) To: Hannes Reinecke Cc: James.Smart, Elliott, Robert (Server Storage), emilne@redhat.com, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley On 11/30/2012 05:44 AM, Hannes Reinecke wrote: > On 11/29/2012 05:02 PM, James Smart wrote: >> Always possible - but.... Our f/w works at the FCP level and >> below, which means it doesn't know/do SCSI commands - e.g what the >> cdb within the FCP CMD frame is; know anything about SCSI device >> classes and state; etc. And it shouldn't be required to do so. >> Anytime this has been there in the past, it's been problematic. >> >> if we want to do this - we should add it to the midlayer/transport. >> > D'accord. Transport layer looks like a good fit. > > What we should be doing is hooking up 'bus_reset' to be equivalent to > REMOVE I_T NEXUS (SAS is already doing this). Do you mean the scsi eh bus reset callout and if so does that work on multiple targets but REMOVE I_T NEXUS only will operate on one at a time? I think it would be cleaner to add a new callout that works like the target reset one where the scsi-ml loops over the targets for the drivers. > > In our case a REMOVE I_T NEXUS would be roughly equivalent to > scsi_remote_port_delete(); only we should be starting aborting > outstanding I/O directly and not waiting for fast_fail_tmo > to kick in. > To abort IO, will you be calling the drivers terminate_rport_io or dev_loss_tmo_callbk? If so I just wanted to warn you that I noticed that some drivers will only initiate the aborting/cleanup of IO in there. So if you call those callouts and expect that when finished scsi-ml can free the scsi command and pass the request back up, I think we could hit some races with memory issues. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-11-30 16:54 ` Mike Christie @ 2012-12-03 7:15 ` Hannes Reinecke 2012-12-03 17:19 ` Jeremy Linton 2012-12-03 22:52 ` Elliott, Robert (Server Storage) 0 siblings, 2 replies; 12+ messages in thread From: Hannes Reinecke @ 2012-12-03 7:15 UTC (permalink / raw) To: Mike Christie Cc: James.Smart, Elliott, Robert (Server Storage), emilne@redhat.com, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley On 11/30/2012 05:54 PM, Mike Christie wrote: > On 11/30/2012 05:44 AM, Hannes Reinecke wrote: >> On 11/29/2012 05:02 PM, James Smart wrote: >>> Always possible - but.... Our f/w works at the FCP level and >>> below, which means it doesn't know/do SCSI commands - e.g what the >>> cdb within the FCP CMD frame is; know anything about SCSI device >>> classes and state; etc. And it shouldn't be required to do so. >>> Anytime this has been there in the past, it's been problematic. >>> >>> if we want to do this - we should add it to the midlayer/transport. >>> >> D'accord. Transport layer looks like a good fit. >> >> What we should be doing is hooking up 'bus_reset' to be equivalent to >> REMOVE I_T NEXUS (SAS is already doing this). > > Do you mean the scsi eh bus reset callout and if so does that work on > multiple targets but REMOVE I_T NEXUS only will operate on one at a > time? I think it would be cleaner to add a new callout that works like > the target reset one where the scsi-ml loops over the targets for the > drivers. > Well, looking at QLogic and Emulex both emulate a bus reset with a loop over each target and invoke a target reset there. I somewhat fail to see the rationale behind it, other than emulating the bus reset behaviour on SPI. Given that the original target reset already failed (otherwise we wouldn't be doing a bus reset), I doubt a _second_ target reset will lead to a different result. So invoking REMOVE I_T NEXUS here can only improve matters :-) I'm all for renaming bus_reset, though :-) >> >> In our case a REMOVE I_T NEXUS would be roughly equivalent to >> scsi_remote_port_delete(); only we should be starting aborting >> outstanding I/O directly and not waiting for fast_fail_tmo >> to kick in. >> > > To abort IO, will you be calling the drivers terminate_rport_io or > dev_loss_tmo_callbk? If so I just wanted to warn you that I noticed that > some drivers will only initiate the aborting/cleanup of IO in there. So > if you call those callouts and expect that when finished scsi-ml can > free the scsi command and pass the request back up, I think we could hit > some races with memory issues. > Yeah, I know. What I had in mind was to invoke terminate_rport_io() and then wait for a certain time until either all outstanding commands have been processes (ie starget->busy drops to zero) or the port state changed. I'm not quite sure as for how long I should be waiting, but dev_loss_tmo will be a good upper limit here. As said, I'll be posting a patch. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-12-03 7:15 ` Hannes Reinecke @ 2012-12-03 17:19 ` Jeremy Linton 2012-12-03 22:52 ` Elliott, Robert (Server Storage) 1 sibling, 0 replies; 12+ messages in thread From: Jeremy Linton @ 2012-12-03 17:19 UTC (permalink / raw) To: Hannes Reinecke, Linux Scsi On 12/3/2012 1:15 AM, Hannes Reinecke wrote: > Well, looking at QLogic and Emulex both emulate a bus reset with a loop > over each target and invoke a target reset there. I somewhat fail to see > the rationale behind it, other than emulating the bus reset behaviour on > SPI. It is actually a _VERY_ bad idea in multiple initiator tape environments with switched fibre where the resets can affect devices that are visible but not owned/controlled by the machine broadcasting resets. Many tape environments operate this way as the physical drives are assigned dynamically to initiators as necessary. In some cases (ACSLS) the machine/OS/backup applications aren't even homogenous. The rewind and loss of PR/etc, which if not handled properly by all the other machines on the SAN can be quite disastrous. Its also somewhat problematic even in single initiator environments as the reset can affect devices not having problems, and the 6/2900's can get eaten by the logic attempting the reset, which leaves the user of a functional device in the dark that it was reset/rewound. I was told last time I brought this up, that it was impossible for a single device's failure to result in that bus reset path being called. Which was patently false as the problem was only tracked down because of a repeatable case of a single device failing in a manner which triggered progressively more aggressive recovery culminating in the bus-reset being called. The result was a single device cascading a failure to a bunch of functional devices and interrupting their operation. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Error handling on FC devices 2012-12-03 7:15 ` Hannes Reinecke 2012-12-03 17:19 ` Jeremy Linton @ 2012-12-03 22:52 ` Elliott, Robert (Server Storage) 2012-12-04 15:56 ` Kipp Aldrich 1 sibling, 1 reply; 12+ messages in thread From: Elliott, Robert (Server Storage) @ 2012-12-03 22:52 UTC (permalink / raw) To: Hannes Reinecke, Mike Christie Cc: James.Smart@emulex.com, emilne@redhat.com, SCSI Mailing List, Andrew Vasquez, Chad Dupuis, James Bottomley > Well, looking at QLogic and Emulex both emulate a bus reset with a > loop over each target and invoke a target reset there. > I somewhat fail to see the rationale behind it, other than emulating > the bus reset behaviour on SPI. > Given that the original target reset already failed (otherwise we > wouldn't be doing a bus reset), I doubt a _second_ target reset > will lead to a different result. > > So invoking REMOVE I_T NEXUS here can only improve matters :-) > > I'm all for renaming bus_reset, though :-) > Since modern storage fabrics don't really have a "bus" to reset any more, a bolder approach of getting rid of bus_reset might be warranted. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Error handling on FC devices 2012-12-03 22:52 ` Elliott, Robert (Server Storage) @ 2012-12-04 15:56 ` Kipp Aldrich 0 siblings, 0 replies; 12+ messages in thread From: Kipp Aldrich @ 2012-12-04 15:56 UTC (permalink / raw) Cc: SCSI Mailing List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/03/2012 04:52 PM, Elliott, Robert (Server Storage) wrote: > >> Well, looking at QLogic and Emulex both emulate a bus reset with a >> loop over each target and invoke a target reset there. >> I somewhat fail to see the rationale behind it, other than emulating >> the bus reset behaviour on SPI. >> Given that the original target reset already failed (otherwise we >> wouldn't be doing a bus reset), I doubt a _second_ target reset >> will lead to a different result. >> >> So invoking REMOVE I_T NEXUS here can only improve matters :-) >> >> I'm all for renaming bus_reset, though :-) >> > > Since modern storage fabrics don't really have a "bus" to reset any more, a bolder approach of getting rid of bus_reset might be warranted. > This is exactly what we do now. We modify drivers to disable bus_resets. It is dangerous, and disruptive, to reliable data storage, especially on SANs. At least make it optional to enable it, disable it by default. Bus resets can destroy or lose customer data, for goodness sake. Any commercial vendor using linux with these drivers that doesn't disable bus resets is risking their customer's data. Really, how early '80s to have bus_reset. Time to move on. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) iEYEARECAAYFAlC+HUcACgkQeGqeAAwqIi147wCcDHEMmyr9m+umifJAbEDGns04 B4gAniGcrjM110cbX9/Ki/3e+jZO4d1o =IRk4 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-12-04 16:02 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-19 12:41 Error handling on FC devices Hannes Reinecke 2012-11-26 22:32 ` James Smart 2012-11-27 20:03 ` Ewan Milne 2012-11-27 20:29 ` Elliott, Robert (Server Storage) 2012-11-28 7:09 ` Hannes Reinecke 2012-11-29 16:02 ` James Smart 2012-11-30 11:44 ` Hannes Reinecke 2012-11-30 16:54 ` Mike Christie 2012-12-03 7:15 ` Hannes Reinecke 2012-12-03 17:19 ` Jeremy Linton 2012-12-03 22:52 ` Elliott, Robert (Server Storage) 2012-12-04 15:56 ` Kipp Aldrich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.