* Stable backports for qla2xxx target mode
@ 2015-08-13 21:00 Nicholas A. Bellinger
2015-08-14 0:49 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-13 21:00 UTC (permalink / raw)
To: Greg KH
Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran,
swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez
Hi Greg-KH,
This is a follow up on a handful of 'WTF' emails for some of the recent
qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable.
a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted
daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive
df673274 qla2xxx: added sess generations to detect RSCN update races
d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode
7359df25 qla2xxx: terminate exchange when command is aborted by LIO
8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR
e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted
These are all bug-fix patches that address real-world correctness issues
reported by a large customer of the qla2xxx target code, and have been
reviewed + tested + signed-off-by the HW LLD maintainer.
Granted these patches are larger in size that I'd normally be
comfortable with CC'ing for stable, but they do address real-world
correctness issues seen in large scale production with qla2xxx target
code.
Please consider adding these to your stable queue, to go along with the
qla2xxx target patches from the same series that have already made it
into v4.1.5.
Thank you,
--nab
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-13 21:00 Stable backports for qla2xxx target mode Nicholas A. Bellinger
@ 2015-08-14 0:49 ` Greg KH
2015-08-14 0:55 ` Nicholas A. Bellinger
2015-08-14 18:37 ` Roland Dreier
0 siblings, 2 replies; 9+ messages in thread
From: Greg KH @ 2015-08-14 0:49 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran,
swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez
On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote:
> Hi Greg-KH,
>
> This is a follow up on a handful of 'WTF' emails for some of the recent
> qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable.
>
> a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted
> daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive
> df673274 qla2xxx: added sess generations to detect RSCN update races
> d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode
> 7359df25 qla2xxx: terminate exchange when command is aborted by LIO
> 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR
> e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted
>
> These are all bug-fix patches that address real-world correctness issues
> reported by a large customer of the qla2xxx target code, and have been
> reviewed + tested + signed-off-by the HW LLD maintainer.
And they are all fricking huge rewrites and additions to the driver,
none of which actually look like they should be added to a stable tree
at all.
> Granted these patches are larger in size that I'd normally be
> comfortable with CC'ing for stable, but they do address real-world
> correctness issues seen in large scale production with qla2xxx target
> code.
>
> Please consider adding these to your stable queue, to go along with the
> qla2xxx target patches from the same series that have already made it
> into v4.1.5.
What exactly are they fixing? They look like they add a ton of new
functions to the driver, and other features. What is so broken in the
driver today that warrants this type of exception to the rules?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-14 0:49 ` Greg KH
@ 2015-08-14 0:55 ` Nicholas A. Bellinger
2015-08-14 1:09 ` Greg KH
2015-08-14 18:37 ` Roland Dreier
1 sibling, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-14 0:55 UTC (permalink / raw)
To: Greg KH
Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran,
swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez
On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote:
> On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote:
> > Hi Greg-KH,
> >
> > This is a follow up on a handful of 'WTF' emails for some of the recent
> > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable.
> >
> > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted
> > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive
> > df673274 qla2xxx: added sess generations to detect RSCN update races
> > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode
> > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO
> > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR
> > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted
> >
> > These are all bug-fix patches that address real-world correctness issues
> > reported by a large customer of the qla2xxx target code, and have been
> > reviewed + tested + signed-off-by the HW LLD maintainer.
>
> And they are all fricking huge rewrites and additions to the driver,
> none of which actually look like they should be added to a stable tree
> at all.
Each of which addresses bugs that the largest consumer of the code (Pure
Storage) has encountered in production.
>
> > Granted these patches are larger in size that I'd normally be
> > comfortable with CC'ing for stable, but they do address real-world
> > correctness issues seen in large scale production with qla2xxx target
> > code.
> >
> > Please consider adding these to your stable queue, to go along with the
> > qla2xxx target patches from the same series that have already made it
> > into v4.1.5.
>
> What exactly are they fixing? They look like they add a ton of new
> functions to the driver, and other features. What is so broken in the
> driver today that warrants this type of exception to the rules?
>
The commit logs go into details about the exact issues, but my
understanding is that it's a combination of OOPsen, potential
data-corruption due to duplicate sessions, and couple different
resource leaks.
Pure Storage and Qlogic folks, would you be so kind as to comment more
on the specifics for Greg-KH to consider..?
Thank you,
--nab
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-14 0:55 ` Nicholas A. Bellinger
@ 2015-08-14 1:09 ` Greg KH
2015-08-14 1:13 ` Nicholas A. Bellinger
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2015-08-14 1:09 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran,
swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez
On Thu, Aug 13, 2015 at 05:55:48PM -0700, Nicholas A. Bellinger wrote:
> On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote:
> > On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote:
> > > Hi Greg-KH,
> > >
> > > This is a follow up on a handful of 'WTF' emails for some of the recent
> > > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable.
> > >
> > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted
> > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive
> > > df673274 qla2xxx: added sess generations to detect RSCN update races
> > > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode
> > > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO
> > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR
> > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted
> > >
> > > These are all bug-fix patches that address real-world correctness issues
> > > reported by a large customer of the qla2xxx target code, and have been
> > > reviewed + tested + signed-off-by the HW LLD maintainer.
> >
> > And they are all fricking huge rewrites and additions to the driver,
> > none of which actually look like they should be added to a stable tree
> > at all.
>
> Each of which addresses bugs that the largest consumer of the code (Pure
> Storage) has encountered in production.
>
> >
> > > Granted these patches are larger in size that I'd normally be
> > > comfortable with CC'ing for stable, but they do address real-world
> > > correctness issues seen in large scale production with qla2xxx target
> > > code.
> > >
> > > Please consider adding these to your stable queue, to go along with the
> > > qla2xxx target patches from the same series that have already made it
> > > into v4.1.5.
> >
> > What exactly are they fixing? They look like they add a ton of new
> > functions to the driver, and other features. What is so broken in the
> > driver today that warrants this type of exception to the rules?
> >
>
> The commit logs go into details about the exact issues, but my
> understanding is that it's a combination of OOPsen, potential
> data-corruption due to duplicate sessions, and couple different
> resource leaks.
>
> Pure Storage and Qlogic folks, would you be so kind as to comment more
> on the specifics for Greg-KH to consider..?
I'd like the scsi maintainer to also approve such an exception, I notice
they didn't even sign off on these patches :(
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-14 1:09 ` Greg KH
@ 2015-08-14 1:13 ` Nicholas A. Bellinger
2015-08-14 1:21 ` Nicholas A. Bellinger
0 siblings, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-14 1:13 UTC (permalink / raw)
To: Greg KH
Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran,
swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez
On Thu, 2015-08-13 at 18:09 -0700, Greg KH wrote:
> On Thu, Aug 13, 2015 at 05:55:48PM -0700, Nicholas A. Bellinger wrote:
> > On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote:
> > > On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote:
> > > > Hi Greg-KH,
> > > >
> > > > This is a follow up on a handful of 'WTF' emails for some of the recent
> > > > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable.
> > > >
> > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted
> > > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive
> > > > df673274 qla2xxx: added sess generations to detect RSCN update races
> > > > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode
> > > > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO
> > > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR
> > > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted
> > > >
> > > > These are all bug-fix patches that address real-world correctness issues
> > > > reported by a large customer of the qla2xxx target code, and have been
> > > > reviewed + tested + signed-off-by the HW LLD maintainer.
> > >
> > > And they are all fricking huge rewrites and additions to the driver,
> > > none of which actually look like they should be added to a stable tree
> > > at all.
> >
> > Each of which addresses bugs that the largest consumer of the code (Pure
> > Storage) has encountered in production.
> >
> > >
> > > > Granted these patches are larger in size that I'd normally be
> > > > comfortable with CC'ing for stable, but they do address real-world
> > > > correctness issues seen in large scale production with qla2xxx target
> > > > code.
> > > >
> > > > Please consider adding these to your stable queue, to go along with the
> > > > qla2xxx target patches from the same series that have already made it
> > > > into v4.1.5.
> > >
> > > What exactly are they fixing? They look like they add a ton of new
> > > functions to the driver, and other features. What is so broken in the
> > > driver today that warrants this type of exception to the rules?
> > >
> >
> > The commit logs go into details about the exact issues, but my
> > understanding is that it's a combination of OOPsen, potential
> > data-corruption due to duplicate sessions, and couple different
> > resource leaks.
> >
> > Pure Storage and Qlogic folks, would you be so kind as to comment more
> > on the specifics for Greg-KH to consider..?
>
> I'd like the scsi maintainer to also approve such an exception, I notice
> they didn't even sign off on these patches :(
>
James is aware of this series being merged via target-pending.git, and
it's request for v3.18.y stable:
http://marc.info/?l=linux-scsi&m=143774948202707&w=2
Thank you,
--nab
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-14 1:13 ` Nicholas A. Bellinger
@ 2015-08-14 1:21 ` Nicholas A. Bellinger
2015-08-14 16:12 ` Quinn Tran
0 siblings, 1 reply; 9+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-14 1:21 UTC (permalink / raw)
To: Greg KH
Cc: target-devel, stable, himanshu.madhani, alexei, quinn.tran,
swapnil.nagle, Roland Dreier, Giridhar Malavali, Andrew Vasquez
On Thu, 2015-08-13 at 18:13 -0700, Nicholas A. Bellinger wrote:
> On Thu, 2015-08-13 at 18:09 -0700, Greg KH wrote:
> > On Thu, Aug 13, 2015 at 05:55:48PM -0700, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-08-13 at 17:49 -0700, Greg KH wrote:
> > > > On Thu, Aug 13, 2015 at 02:00:45PM -0700, Nicholas A. Bellinger wrote:
> > > > > Hi Greg-KH,
> > > > >
> > > > > This is a follow up on a handful of 'WTF' emails for some of the recent
> > > > > qla2xxx target patches from v4.2-rc5 CC'ed for v3.18.y stable.
> > > > >
> > > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions are deleted
> > > > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrive
> > > > > df673274 qla2xxx: added sess generations to detect RSCN update races
> > > > > d20ed91b qla2xxx: disable scsi_transport_fc registration in target mode
> > > > > 7359df25 qla2xxx: terminate exchange when command is aborted by LIO
> > > > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before processing TMR
> > > > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being deleted
> > > > >
> > > > > These are all bug-fix patches that address real-world correctness issues
> > > > > reported by a large customer of the qla2xxx target code, and have been
> > > > > reviewed + tested + signed-off-by the HW LLD maintainer.
> > > >
> > > > And they are all fricking huge rewrites and additions to the driver,
> > > > none of which actually look like they should be added to a stable tree
> > > > at all.
> > >
> > > Each of which addresses bugs that the largest consumer of the code (Pure
> > > Storage) has encountered in production.
> > >
> > > >
> > > > > Granted these patches are larger in size that I'd normally be
> > > > > comfortable with CC'ing for stable, but they do address real-world
> > > > > correctness issues seen in large scale production with qla2xxx target
> > > > > code.
> > > > >
> > > > > Please consider adding these to your stable queue, to go along with the
> > > > > qla2xxx target patches from the same series that have already made it
> > > > > into v4.1.5.
> > > >
> > > > What exactly are they fixing? They look like they add a ton of new
> > > > functions to the driver, and other features. What is so broken in the
> > > > driver today that warrants this type of exception to the rules?
> > > >
> > >
> > > The commit logs go into details about the exact issues, but my
> > > understanding is that it's a combination of OOPsen, potential
> > > data-corruption due to duplicate sessions, and couple different
> > > resource leaks.
> > >
> > > Pure Storage and Qlogic folks, would you be so kind as to comment more
> > > on the specifics for Greg-KH to consider..?
> >
> > I'd like the scsi maintainer to also approve such an exception, I notice
> > they didn't even sign off on these patches :(
> >
>
> James is aware of this series being merged via target-pending.git, and
> it's request for v3.18.y stable:
>
> http://marc.info/?l=linux-scsi&m=143774948202707&w=2
>
and here:
http://marc.info/?l=linux-scsi&m=143623177430993&w=2
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-14 1:21 ` Nicholas A. Bellinger
@ 2015-08-14 16:12 ` Quinn Tran
0 siblings, 0 replies; 9+ messages in thread
From: Quinn Tran @ 2015-08-14 16:12 UTC (permalink / raw)
To: Nicholas A. Bellinger, Greg KH
Cc: target-devel, stable@vger.kernel.org, Himanshu Madhani,
alexei@purestorage.com, swapnil.nagle@purestorage.com,
Roland Dreier, Giridhar Malavali, Andrew Vasquez
Greg K-H,
These changes were developed and come to agreement between Qlogic�s team
and the Target customer. The customer had verified the fix in their
environment in addition with our test environment. Even though existing
code is in stable tree show no symptom. Under stress, we found a nest of
bugs in this section of code.
If you have not had a chance to inspect the individual fix, here�s quick
summary on each:
>> > > > a6ca8878 qla2xxx: delay plogi/prli ack until existing sessions
>>are deleted
QT> Data corruption fix. Commands from old session were criss cross with
new session. CMD OXID from new session will reuse OXID from old session.
Cmd from old session need to be flushed before new session is established.
>> > > > daddf5cf qla2xxx: Abort stale cmds on qla_tgt_wq when plogi
>>arrive
QT> Stale pointer access fix. Driver can loose track of command when cmd
is being submit between fabric driver and target core driver. When
command shows up on the TCM side, resource/memory could disappear. The
fix is to start cmd tracking when it�s alloced.
>> > > > df673274 qla2xxx: added sess generations to detect RSCN update
>>races
QT> Race condition fix. Stale RSCN could step on a live session causing
session to stop responding to new cmds. Add counter to prevent race
condition.
>> > > > d20ed91b qla2xxx: disable scsi_transport_fc registration in
>>target mode
QT> Reduce interference by the initiator personality of the driver that
affect the Target side of the same driver.
>> > > > 7359df25 qla2xxx: terminate exchange when command is aborted by
>>LIO
QT> FW resource/Exchange starvation fix. A change at the TCM API cause
interaction problem. This cause improper cleanup at the HW level when
SCSI Layer error condition arrive.
>> > > > 8b2f5ff3 qla2xxx: cleanup cmd in qla workqueue before
>>processing TMR
QT> Race condition fix. CMD & ABTS(i.e. abort) can get out of order that
cause false error respond back to Initiator. This will lead to higher
escalation of error recovery.
>> > > > e52a8b45 qla2xxx: drop cmds/tmrs arrived while session is being
>>deleted
QT> Reduce false error. When session is being deleted, any stale respond
from TCM send to HW will generate false error. The fix is to intercept
the cmd and advance it to its final state.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-14 0:49 ` Greg KH
2015-08-14 0:55 ` Nicholas A. Bellinger
@ 2015-08-14 18:37 ` Roland Dreier
2015-08-14 19:19 ` Greg KH
1 sibling, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2015-08-14 18:37 UTC (permalink / raw)
To: Greg KH
Cc: Nicholas A. Bellinger, target-devel, stable, Himanshu Madhani,
Alexei Potashnik, Quinn Tran, Swapnil Nagle, Giridhar Malavali,
Andrew Vasquez
On Thu, Aug 13, 2015 at 5:49 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> What exactly are they fixing? They look like they add a ton of new
> functions to the driver, and other features. What is so broken in the
> driver today that warrants this type of exception to the rules?
Let me start by saying that I'm not sure these changes really belong
in stable, and I totally understand why you're pushing back.
However the fact is that upstream qla2xxx is utterly broken in target
mode — trying to use Linux as a real FC target, where we have to deal
with fabric changes, initiators going away, and so on, is hopeless
without these changes. The target will crash, or data will be
corrupted, or similarly severe issues.
We were definitely derelict in getting our fixes upstream for a while,
but they are there now. So using a new kernel has a chance at
working. I think it's totally legitimate to say that the changes are
too big for -stable, and someone who wants to use qla2xxx target mode
needs a recent kernel or their own backports. But as I said, the bugs
are pretty severe (kernel crash or worse) so I could see pulling the
changes into -stable also.
hth,
- R.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Stable backports for qla2xxx target mode
2015-08-14 18:37 ` Roland Dreier
@ 2015-08-14 19:19 ` Greg KH
0 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2015-08-14 19:19 UTC (permalink / raw)
To: Roland Dreier
Cc: Nicholas A. Bellinger, target-devel, stable, Himanshu Madhani,
Alexei Potashnik, Quinn Tran, Swapnil Nagle, Giridhar Malavali,
Andrew Vasquez
On Fri, Aug 14, 2015 at 11:37:03AM -0700, Roland Dreier wrote:
> On Thu, Aug 13, 2015 at 5:49 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > What exactly are they fixing? They look like they add a ton of new
> > functions to the driver, and other features. What is so broken in the
> > driver today that warrants this type of exception to the rules?
>
> Let me start by saying that I'm not sure these changes really belong
> in stable, and I totally understand why you're pushing back.
>
> However the fact is that upstream qla2xxx is utterly broken in target
> mode — trying to use Linux as a real FC target, where we have to deal
> with fabric changes, initiators going away, and so on, is hopeless
> without these changes. The target will crash, or data will be
> corrupted, or similarly severe issues.
>
> We were definitely derelict in getting our fixes upstream for a while,
> but they are there now. So using a new kernel has a chance at
> working. I think it's totally legitimate to say that the changes are
> too big for -stable, and someone who wants to use qla2xxx target mode
> needs a recent kernel or their own backports. But as I said, the bugs
> are pretty severe (kernel crash or worse) so I could see pulling the
> changes into -stable also.
Ok, so given that it seems this code has always been broken in this mode
of operation, and the patches are huge, and they aren't even in a
normally released kernel, I'm not going to take them into -stable at
this point in time.
Feel free to resend them after about 6 months or so, after the fallout
of these major changes has made their way to the distros and users, and
you have it all working correctly, and I'll be glad to revisit them.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-08-14 19:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-13 21:00 Stable backports for qla2xxx target mode Nicholas A. Bellinger
2015-08-14 0:49 ` Greg KH
2015-08-14 0:55 ` Nicholas A. Bellinger
2015-08-14 1:09 ` Greg KH
2015-08-14 1:13 ` Nicholas A. Bellinger
2015-08-14 1:21 ` Nicholas A. Bellinger
2015-08-14 16:12 ` Quinn Tran
2015-08-14 18:37 ` Roland Dreier
2015-08-14 19:19 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).