* Stable backports for qla2xxx target mode @ 2015-08-13 21:00 Nicholas A. Bellinger 2015-08-14 0:49 ` Greg KH 0 siblings, 1 reply; 9+ messages in thread From: Nicholas A. Bellinger @ 2015-08-13 21:00 UTC (permalink / raw) To: Greg KH Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran, swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez Hi Greg-KH, This is a follow up on a handful of 'WTF' emails for some of the recent qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable. a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive df673274 qla2xxx: added sess generations to detect RSCN update races d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode 7359df25 qla2xxx: terminate exchange when command is aborted by LIO 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted These are all bug-fix patches that address real-world correctness issues reported by a large customer of the qla2xxx target code, and have been reviewed + tested + signed-off-by the HW LLD maintainer. Granted these patches are larger in size that I'd normally be comfortable with CC'ing for stable, but they do address real-world correctness issues seen in large scale production with qla2xxx target code. Please consider adding these to your stable queue, to go along with the qla2xxx target patches from the same series that have already made it into v4.1.5. Thank you, --nab ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-13 21:00 Stable backports for qla2xxx target mode Nicholas A. Bellinger @ 2015-08-14 0:49 ` Greg KH 2015-08-14 0:55 ` Nicholas A. Bellinger 2015-08-14 18:37 ` Roland Dreier 0 siblings, 2 replies; 9+ messages in thread From: Greg KH @ 2015-08-14 0:49 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran, swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote: > Hi Greg-KH, > > This is a follow up on a handful of 'WTF' emails for some of the recent > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable. > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive > df673274 qla2xxx: added sess generations to detect RSCN update races > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted > > These are all bug-fix patches that address real-world correctness issues > reported by a large customer of the qla2xxx target code, and have been > reviewed + tested + signed-off-by the HW LLD maintainer. And they are all fricking huge rewrites and additions to the driver, none of which actually look like they should be added to a stable tree at all. > Granted these patches are larger in size that I'd normally be > comfortable with CC'ing for stable, but they do address real-world > correctness issues seen in large scale production with qla2xxx target > code. > > Please consider adding these to your stable queue, to go along with the > qla2xxx target patches from the same series that have already made it > into v4.1.5. What exactly are they fixing? They look like they add a ton of new functions to the driver, and other features. What is so broken in the driver today that warrants this type of exception to the rules? thanks, greg k-h ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-14 0:49 ` Greg KH @ 2015-08-14 0:55 ` Nicholas A. Bellinger 2015-08-14 1:09 ` Greg KH 2015-08-14 18:37 ` Roland Dreier 1 sibling, 1 reply; 9+ messages in thread From: Nicholas A. Bellinger @ 2015-08-14 0:55 UTC (permalink / raw) To: Greg KH Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran, swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote: > On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote: > > Hi Greg-KH, > > > > This is a follow up on a handful of 'WTF' emails for some of the recent > > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable. > > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive > > df673274 qla2xxx: added sess generations to detect RSCN update races > > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode > > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted > > > > These are all bug-fix patches that address real-world correctness issues > > reported by a large customer of the qla2xxx target code, and have been > > reviewed + tested + signed-off-by the HW LLD maintainer. > > And they are all fricking huge rewrites and additions to the driver, > none of which actually look like they should be added to a stable tree > at all. Each of which addresses bugs that the largest consumer of the code (Pure Storage) has encountered in production. > > > Granted these patches are larger in size that I'd normally be > > comfortable with CC'ing for stable, but they do address real-world > > correctness issues seen in large scale production with qla2xxx target > > code. > > > > Please consider adding these to your stable queue, to go along with the > > qla2xxx target patches from the same series that have already made it > > into v4.1.5. > > What exactly are they fixing? They look like they add a ton of new > functions to the driver, and other features. What is so broken in the > driver today that warrants this type of exception to the rules? > The commit logs go into details about the exact issues, but my understanding is that it's a combination of OOPsen, potential data-corruption due to duplicate sessions, and couple different resource leaks. Pure Storage and Qlogic folks, would you be so kind as to comment more on the specifics for Greg-KH to consider..? Thank you, --nab ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-14 0:55 ` Nicholas A. Bellinger @ 2015-08-14 1:09 ` Greg KH 2015-08-14 1:13 ` Nicholas A. Bellinger 0 siblings, 1 reply; 9+ messages in thread From: Greg KH @ 2015-08-14 1:09 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran, swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez On Thu, Aug 13, 2015 at 05:55:48PM -0700, Nicholas A. Bellinger wrote: > On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote: > > On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote: > > > Hi Greg-KH, > > > > > > This is a follow up on a handful of 'WTF' emails for some of the recent > > > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable. > > > > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted > > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive > > > df673274 qla2xxx: added sess generations to detect RSCN update races > > > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode > > > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO > > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR > > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted > > > > > > These are all bug-fix patches that address real-world correctness issues > > > reported by a large customer of the qla2xxx target code, and have been > > > reviewed + tested + signed-off-by the HW LLD maintainer. > > > > And they are all fricking huge rewrites and additions to the driver, > > none of which actually look like they should be added to a stable tree > > at all. > > Each of which addresses bugs that the largest consumer of the code (Pure > Storage) has encountered in production. > > > > > > Granted these patches are larger in size that I'd normally be > > > comfortable with CC'ing for stable, but they do address real-world > > > correctness issues seen in large scale production with qla2xxx target > > > code. > > > > > > Please consider adding these to your stable queue, to go along with the > > > qla2xxx target patches from the same series that have already made it > > > into v4.1.5. > > > > What exactly are they fixing? They look like they add a ton of new > > functions to the driver, and other features. What is so broken in the > > driver today that warrants this type of exception to the rules? > > > > The commit logs go into details about the exact issues, but my > understanding is that it's a combination of OOPsen, potential > data-corruption due to duplicate sessions, and couple different > resource leaks. > > Pure Storage and Qlogic folks, would you be so kind as to comment more > on the specifics for Greg-KH to consider..? I'd like the scsi maintainer to also approve such an exception, I notice they didn't even sign off on these patches :( greg k-h ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-14 1:09 ` Greg KH @ 2015-08-14 1:13 ` Nicholas A. Bellinger 2015-08-14 1:21 ` Nicholas A. Bellinger 0 siblings, 1 reply; 9+ messages in thread From: Nicholas A. Bellinger @ 2015-08-14 1:13 UTC (permalink / raw) To: Greg KH Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran, swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez On Thu, 2015-08-13 at 18:09 -0700, Greg KH wrote: > On Thu, Aug 13, 2015 at 05:55:48PM -0700, Nicholas A. Bellinger wrote: > > On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote: > > > On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote: > > > > Hi Greg-KH, > > > > > > > > This is a follow up on a handful of 'WTF' emails for some of the recent > > > > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable. > > > > > > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted > > > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive > > > > df673274 qla2xxx: added sess generations to detect RSCN update races > > > > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode > > > > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO > > > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR > > > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted > > > > > > > > These are all bug-fix patches that address real-world correctness issues > > > > reported by a large customer of the qla2xxx target code, and have been > > > > reviewed + tested + signed-off-by the HW LLD maintainer. > > > > > > And they are all fricking huge rewrites and additions to the driver, > > > none of which actually look like they should be added to a stable tree > > > at all. > > > > Each of which addresses bugs that the largest consumer of the code (Pure > > Storage) has encountered in production. > > > > > > > > > Granted these patches are larger in size that I'd normally be > > > > comfortable with CC'ing for stable, but they do address real-world > > > > correctness issues seen in large scale production with qla2xxx target > > > > code. > > > > > > > > Please consider adding these to your stable queue, to go along with the > > > > qla2xxx target patches from the same series that have already made it > > > > into v4.1.5. > > > > > > What exactly are they fixing? They look like they add a ton of new > > > functions to the driver, and other features. What is so broken in the > > > driver today that warrants this type of exception to the rules? > > > > > > > The commit logs go into details about the exact issues, but my > > understanding is that it's a combination of OOPsen, potential > > data-corruption due to duplicate sessions, and couple different > > resource leaks. > > > > Pure Storage and Qlogic folks, would you be so kind as to comment more > > on the specifics for Greg-KH to consider..? > > I'd like the scsi maintainer to also approve such an exception, I notice > they didn't even sign off on these patches :( > James is aware of this series being merged via target-pending.git, and it's request for v3.18.y stable: http://marc.info/?l=linux-scsi&m=143774948202707&w=2 Thank you, --nab ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-14 1:13 ` Nicholas A. Bellinger @ 2015-08-14 1:21 ` Nicholas A. Bellinger 2015-08-14 16:12 ` Quinn Tran 0 siblings, 1 reply; 9+ messages in thread From: Nicholas A. Bellinger @ 2015-08-14 1:21 UTC (permalink / raw) To: Greg KH Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran, swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez On Thu, 2015-08-13 at 18:13 -0700, Nicholas A. Bellinger wrote: > On Thu, 2015-08-13 at 18:09 -0700, Greg KH wrote: > > On Thu, Aug 13, 2015 at 05:55:48PM -0700, Nicholas A. Bellinger wrote: > > > On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote: > > > > On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote: > > > > > Hi Greg-KH, > > > > > > > > > > This is a follow up on a handful of 'WTF' emails for some of the recent > > > > > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable. > > > > > > > > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted > > > > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive > > > > > df673274 qla2xxx: added sess generations to detect RSCN update races > > > > > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode > > > > > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO > > > > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR > > > > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted > > > > > > > > > > These are all bug-fix patches that address real-world correctness issues > > > > > reported by a large customer of the qla2xxx target code, and have been > > > > > reviewed + tested + signed-off-by the HW LLD maintainer. > > > > > > > > And they are all fricking huge rewrites and additions to the driver, > > > > none of which actually look like they should be added to a stable tree > > > > at all. > > > > > > Each of which addresses bugs that the largest consumer of the code (Pure > > > Storage) has encountered in production. > > > > > > > > > > > > Granted these patches are larger in size that I'd normally be > > > > > comfortable with CC'ing for stable, but they do address real-world > > > > > correctness issues seen in large scale production with qla2xxx target > > > > > code. > > > > > > > > > > Please consider adding these to your stable queue, to go along with the > > > > > qla2xxx target patches from the same series that have already made it > > > > > into v4.1.5. > > > > > > > > What exactly are they fixing? They look like they add a ton of new > > > > functions to the driver, and other features. What is so broken in the > > > > driver today that warrants this type of exception to the rules? > > > > > > > > > > The commit logs go into details about the exact issues, but my > > > understanding is that it's a combination of OOPsen, potential > > > data-corruption due to duplicate sessions, and couple different > > > resource leaks. > > > > > > Pure Storage and Qlogic folks, would you be so kind as to comment more > > > on the specifics for Greg-KH to consider..? > > > > I'd like the scsi maintainer to also approve such an exception, I notice > > they didn't even sign off on these patches :( > > > > James is aware of this series being merged via target-pending.git, and > it's request for v3.18.y stable: > > http://marc.info/?l=linux-scsi&m=143774948202707&w=2 > and here: http://marc.info/?l=linux-scsi&m=143623177430993&w=2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-14 1:21 ` Nicholas A. Bellinger @ 2015-08-14 16:12 ` Quinn Tran 0 siblings, 0 replies; 9+ messages in thread From: Quinn Tran @ 2015-08-14 16:12 UTC (permalink / raw) To: Nicholas A. Bellinger, Greg KH Cc: target-devel, stable@vger.kernel.org, Himanshu Madhani, alexei@purestorage.com, swapnil.nagle@purestorage.com, Roland Dreier, Giridhar Malavali, Andrew Vasquez Greg K-H, These changes were developed and come to agreement between Qlogic�s team and the Target customer. The customer had verified the fix in their environment in addition with our test environment. Even though existing code is in stable tree show no symptom. Under stress, we found a nest of bugs in this section of code. If you have not had a chance to inspect the individual fix, here�s quick summary on each: >> > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions >>are deleted QT> Data corruption fix. Commands from old session were criss cross with new session. CMD OXID from new session will reuse OXID from old session. Cmd from old session need to be flushed before new session is established. >> > > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi >>arrive QT> Stale pointer access fix. Driver can loose track of command when cmd is being submit between fabric driver and target core driver. When command shows up on the TCM side, resource/memory could disappear. The fix is to start cmd tracking when it�s alloced. >> > > > df673274 qla2xxx: added sess generations to detect RSCN update >>races QT> Race condition fix. Stale RSCN could step on a live session causing session to stop responding to new cmds. Add counter to prevent race condition. >> > > > d20ed91b qla2xxx: disable scsi_transport_fc registration in >>target mode QT> Reduce interference by the initiator personality of the driver that affect the Target side of the same driver. >> > > > 7359df25 qla2xxx: terminate exchange when command is aborted by >>LIO QT> FW resource/Exchange starvation fix. A change at the TCM API cause interaction problem. This cause improper cleanup at the HW level when SCSI Layer error condition arrive. >> > > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before >>processing TMR QT> Race condition fix. CMD & ABTS(i.e. abort) can get out of order that cause false error respond back to Initiator. This will lead to higher escalation of error recovery. >> > > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being >>deleted QT> Reduce false error. When session is being deleted, any stale respond from TCM send to HW will generate false error. The fix is to intercept the cmd and advance it to its final state. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-14 0:49 ` Greg KH 2015-08-14 0:55 ` Nicholas A. Bellinger @ 2015-08-14 18:37 ` Roland Dreier 2015-08-14 19:19 ` Greg KH 1 sibling, 1 reply; 9+ messages in thread From: Roland Dreier @ 2015-08-14 18:37 UTC (permalink / raw) To: Greg KH Cc: Nicholas A. Bellinger, target-devel, stable, Himanshu Madhani, Alexei Potashnik, Quinn Tran, Swapnil Nagle, Giridhar Malavali, Andrew Vasquez On Thu, Aug 13, 2015 at 5:49 PM, Greg KH <gregkh@linuxfoundation.org> wrote: > What exactly are they fixing? They look like they add a ton of new > functions to the driver, and other features. What is so broken in the > driver today that warrants this type of exception to the rules? Let me start by saying that I'm not sure these changes really belong in stable, and I totally understand why you're pushing back. However the fact is that upstream qla2xxx is utterly broken in target mode — trying to use Linux as a real FC target, where we have to deal with fabric changes, initiators going away, and so on, is hopeless without these changes. The target will crash, or data will be corrupted, or similarly severe issues. We were definitely derelict in getting our fixes upstream for a while, but they are there now. So using a new kernel has a chance at working. I think it's totally legitimate to say that the changes are too big for -stable, and someone who wants to use qla2xxx target mode needs a recent kernel or their own backports. But as I said, the bugs are pretty severe (kernel crash or worse) so I could see pulling the changes into -stable also. hth, - R. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode 2015-08-14 18:37 ` Roland Dreier @ 2015-08-14 19:19 ` Greg KH 0 siblings, 0 replies; 9+ messages in thread From: Greg KH @ 2015-08-14 19:19 UTC (permalink / raw) To: Roland Dreier Cc: Nicholas A. Bellinger, target-devel, stable, Himanshu Madhani, Alexei Potashnik, Quinn Tran, Swapnil Nagle, Giridhar Malavali, Andrew Vasquez On Fri, Aug 14, 2015 at 11:37:03AM -0700, Roland Dreier wrote: > On Thu, Aug 13, 2015 at 5:49 PM, Greg KH <gregkh@linuxfoundation.org> wrote: > > What exactly are they fixing? They look like they add a ton of new > > functions to the driver, and other features. What is so broken in the > > driver today that warrants this type of exception to the rules? > > Let me start by saying that I'm not sure these changes really belong > in stable, and I totally understand why you're pushing back. > > However the fact is that upstream qla2xxx is utterly broken in target > mode — trying to use Linux as a real FC target, where we have to deal > with fabric changes, initiators going away, and so on, is hopeless > without these changes. The target will crash, or data will be > corrupted, or similarly severe issues. > > We were definitely derelict in getting our fixes upstream for a while, > but they are there now. So using a new kernel has a chance at > working. I think it's totally legitimate to say that the changes are > too big for -stable, and someone who wants to use qla2xxx target mode > needs a recent kernel or their own backports. But as I said, the bugs > are pretty severe (kernel crash or worse) so I could see pulling the > changes into -stable also. Ok, so given that it seems this code has always been broken in this mode of operation, and the patches are huge, and they aren't even in a normally released kernel, I'm not going to take them into -stable at this point in time. Feel free to resend them after about 6 months or so, after the fallout of these major changes has made their way to the distros and users, and you have it all working correctly, and I'll be glad to revisit them. thanks, greg k-h ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-08-14 19:19 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-13 21:00 Stable backports for qla2xxx target mode Nicholas A. Bellinger 2015-08-14 0:49 ` Greg KH 2015-08-14 0:55 ` Nicholas A. Bellinger 2015-08-14 1:09 ` Greg KH 2015-08-14 1:13 ` Nicholas A. Bellinger 2015-08-14 1:21 ` Nicholas A. Bellinger 2015-08-14 16:12 ` Quinn Tran 2015-08-14 18:37 ` Roland Dreier 2015-08-14 19:19 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).